From zgu at redhat.com Mon Jan 6 19:14:05 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 6 Jan 2020 14:14:05 -0500 Subject: [15] RFR(XS) 8236681: Shenandoah: Disable concurrent class unloading flag if no class unloading for the GC cycle Message-ID: Please review this small patch that disables concurrent class unloading if there is no class unloading for the particular GC cycle. This is not a fatal error, but can confuse verifier. Bug: https://bugs.openjdk.java.net/browse/JDK-8236681 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236681/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From rkennke at redhat.com Tue Jan 7 07:34:37 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 7 Jan 2020 08:34:37 +0100 Subject: [15] RFR(XS) 8236681: Shenandoah: Disable concurrent class unloading flag if no class unloading for the GC cycle In-Reply-To: References: Message-ID: <14e0f3c6-6e54-67c3-7229-0c7b475badf0@redhat.com> Ok. Thanks! Roman Roman > Please review this small patch that disables concurrent class unloading > if there is no class unloading for the particular GC cycle. > > This is not a fatal error, but can confuse verifier. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236681 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236681/webrev.00/ > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > > > Thanks, > > -Zhengyu > From stefan.johansson at oracle.com Tue Jan 7 09:10:44 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 7 Jan 2020 10:10:44 +0100 Subject: RFR[14]: 8235751: Assertion when triggering concurrent cycle during shutdown In-Reply-To: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> References: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> Message-ID: <23ec1be7-9619-d105-f05c-29603840839a@oracle.com> Hi Kim, On 2019-12-31 04:01, Kim Barrett wrote: > Please review this change to G1's handling of requests to initiate > concurrent marking. > > When such a request is made during shutdown processing, after the cm > thread has been stopped, the request to initiate concurrent marking is > ignored. This could lead to an assertion failure for user requested > GCs (System.gc and via agent) by a thread that has not yet been > brought to a halt, because the possibility of such a request being > ignored was missed when the assertion was recently added by JDK-8232588. > > We now report to the GC-invoking thread when initiation of concurrent > marking has been suppressed because termination of the cm thread has > been requested. In that case the GC invocation is considered finished. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8235751 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8235751/open.00/ > Looks good, StefanJ > Testing: > mach5 tier1-5 > > Locally (linux-x64) reproduced fairly quickly the failure using the > approach described in the CR; after applying the proposed chage, > failed to reproduce. > From thomas.schatzl at oracle.com Tue Jan 7 10:41:38 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 7 Jan 2020 11:41:38 +0100 Subject: [14] RFR (S): 8235934: gc/g1/TestGCLogMessages.java fails with 'DerivedPointerTable Update' found In-Reply-To: References: <41e652e0-b843-e2da-c196-37b9b327d4aa@oracle.com> Message-ID: <0635ecc6-a4e9-0d34-d320-002ff148ca1a@oracle.com> Hi Kim, thanks for your review. On 31.12.19 08:09, Kim Barrett wrote: >> On Dec 17, 2019, at 4:27 AM, Thomas Schatzl wrote: >> >> Hi all, >> >> can I have reviews for this testbug where there is a mismatch between "C2 compiler is enabled" and "C2 compiler is compiled in" in verifying output messages. >> >> I.e. G1 prints some additional log messages if the C2 compiler is compiled in, but the test checks this message for (non-)existence if the C2 compiler is enabled. >> >> Since there are a few flags that can toggle compiler use even when compiled in (UseCompiler, TieredStopAtLevel<=3, ...) the GC prints that message but the test does not expect it. >> >> The fix is to add a whitebox method that specifically returns whether the C2 compiler is compiled in or not, to be used by the test. >> >> I would like to push this to 14 even if it is P4 because of the test bug exemption, returning unnecessary reproducable errors. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8235934 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8235934/webrev/ >> Testing: >> hs-tier1-3, local runs of TestGCLogMessages.java >> >> Thanks, >> Thomas > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/whitebox.cpp > 1990 #if COMPILER2_OR_JVMCI > 1991 return true; > 1992 #else > 1993 return false; > 1994 #endif > > This could perhaps be just > > return bool(COMPILER_OR_JVMCI); > > That will fail to compile if COMPILER_OR_JVMCI is not defined at all; > not sure whether that's a pro or con for this alternative form. I do not have an opinion, so kept it as is. > ------------------------------------------------------------------------------ > src/hotspot/share/prims/whitebox.cpp > 1989 WB_ENTRY(jboolean, WB_isC2OrGraalIncludedInVmBuild(JNIEnv* env)) > > I think the name ought to use "Jvmci" rather than "Graal". > > ------------------------------------------------------------------------------ > Fixed. http://cr.openjdk.java.net/~tschatzl/8235934/webrev.0_to_1 (diff) http://cr.openjdk.java.net/~tschatzl/8235934/webrev.1 (full) Tested locally. Thanks, Thomas From thomas.schatzl at oracle.com Tue Jan 7 10:55:52 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 7 Jan 2020 11:55:52 +0100 Subject: RFR (M): 8235860: Obsolete the UseParallelOldGC option In-Reply-To: <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> References: <292ab94f-f2c8-b373-d5a5-46a45470540e@oracle.com> <2A4B1955-26D5-4544-B476-6E9E5E8009D4@oracle.com> <5e21e50d-a026-98ba-d03d-3f7aa1c31e21@oracle.com> <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> Message-ID: <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> Hi Kim, On 18.12.19 16:45, Kim Barrett wrote: > > >> On Dec 18, 2019, at 4:52 AM, Thomas Schatzl wrote: >> >> Fixed in >> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.0_to_1 (diff) >> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.1 (full) > > Looks good. > Thanks for your review. >> >>> ------------------------------------------------------------------------------ >>> src/hotspot/share/gc/parallel/psParallelCompact.hpp >>> Pre-existing: It seems like the big block comment before SplitInfo >>> should have received some updates as part of the recent shadow-region >>> patch, but it wasn't touched. >>> ------------------------------------------------------------------------------ >> >> I am filing a CR for that. > > The comment before PSParallelCompact in the same file might also need some updating. > > (I was a bit confused in my earlier review about where the relevant comments were.) > I filed JDK-8141637 before the holidays. I added your recent comment. Thanks, Thomas From thomas.schatzl at oracle.com Tue Jan 7 11:47:30 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 7 Jan 2020 12:47:30 +0100 Subject: RFR[14]: 8235751: Assertion when triggering concurrent cycle during shutdown In-Reply-To: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> References: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> Message-ID: Hi, On 31.12.19 04:01, Kim Barrett wrote: > Please review this change to G1's handling of requests to initiate > concurrent marking. > > When such a request is made during shutdown processing, after the cm > thread has been stopped, the request to initiate concurrent marking is > ignored. This could lead to an assertion failure for user requested > GCs (System.gc and via agent) by a thread that has not yet been > brought to a halt, because the possibility of such a request being > ignored was missed when the assertion was recently added by JDK-8232588. > > We now report to the GC-invoking thread when initiation of concurrent > marking has been suppressed because termination of the cm thread has > been requested. In that case the GC invocation is considered finished. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8235751 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8235751/open.00/ > > Testing: > mach5 tier1-5 > > Locally (linux-x64) reproduced fairly quickly the failure using the > approach described in the CR; after applying the proposed chage, > failed to reproduce. > looks good. Thomas From maoliang.ml at alibaba-inc.com Tue Jan 7 16:33:25 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 08 Jan 2020 00:33:25 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: Hi Thomas, As we previously discussed, I use the concurrent heap uncommit/commit mechanism to implement the SoftMaxHeapSize for G1. It is also for the further implementation of G1ElasticHeap for ergonomic change of heap size. In the previous 8u implementation, we had some limitations which are all removed now in this patch. The concurrent uncommit/commit can also work with some senarios for immediate heap expansion. Here is the webrev link: http://cr.openjdk.java.net/~luchsh/8236073.webrev/ We still have some questions. 1. Does the SoftMaxHeapSize limitation need to consider the GC time ratio as in expand_heap_after_young_collection? Now we haven't put the logic in yet. 2. The concurrent uncommit/commit can only work for G1RegionsLargerThanCommitSizeMapper but not G1RegionsSmallerThanCommitSizeMapper which might need some locks to ensure the multi-thread synchronization issue( heap may expand immediately). I think bringing the lock synchronization may not be worthy for the little gain. Another idea is can we just not uncommit the pages of auxiliary data if in G1RegionsSmallerThanCommitSizeMapper? Heap regions should not be G1RegionsSmallerThanCommitSizeMapper most of time I guess... Looking forward to your advice:) Thanks, Liang ------------------------------------------------------------------ From:MAO, Liang Send Time:2019 Oct. 14 (Mon.) 11:52 To:Thomas Schatzl ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi Thomas, Thank you for the recognition:) Since we both agree on some clear specific points, I will try to extract them from current implementation and create a patch in OpenJDK upstream branch so we can continue discussion on the code level. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2019 Oct. 12 (Sat.) 23:00 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi, On Sat, 2019-10-12 at 19:51 +0800, Liang Mao wrote: > Hi Thomas, > > The manual generation limit can be put aside currently since we know > it might not be so general for a GC. We can focus on how to change > heap size and return memory in runtime first. > > GCTimeRatio is a good metric to measure the health of a Java > application and I have considered to use that. But finally I chose > a simple way just like the periodic old GC. Guarantee a long > enough young GC interval is an alternative way to make sure the > GCTimeRatio at a heathy state. > I'm absolutely ok to use GCTimeRatio instead of the fixed young GC > interval. This part is same to ZGC or Shenandoah for how to balance > the desired memory size and GC frequency. I'm open to any good > solution and we are already in the same page for this issue > I think:) +1 > A big difference of our implementation is evaluating heap resizing in > any young GC instead of a concurrent gc cycle which I think is > swifter and more immmediate. The concurrent map/unmap > mechanism gets rid of the additional pause time. My thought is the > heap shrink/expand can be all determined in young GC pause and > performed in concurrent thread which could exclude the > considerable time cost by OS interface. Most of our Java users are > intolerant to those pause pikes caused by page fault which can be up > to seconds. And we also found the issue of time cost by map/unmap in > ZGC. > > A direct advantage of the young GC resizing and concurrent memory > free machanism is for implementing SoftMaxHeapSize. The heap size can > be changed after last mixed GC. The young GC won't have longer > pause and the memory can be freed concurrently without side effect. Agree and agree. Both evaluating and giving back memory at any gc sounds nice, and doing that without incurring the costs in the pause is even better :) Thanks, Thomas From rkennke at redhat.com Tue Jan 7 20:26:14 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 7 Jan 2020 21:26:14 +0100 Subject: RFR: 8236732: Shenandoah: Stricter placement for oom-evac scopes Message-ID: <2c07b4bc-70ce-5107-6c77-76a59c912ac6@redhat.com> I'm currently looking at a deadlock with the derby benchmark which involves oom-scopes and new concurrent-class-unloading. Currently, we have sprinkled OOM-evac scopes all over the place: - In the main evac-loop (of course) - In the LRB (of course) - In various places The latter is very questionable and has repeatedly lead to problems in the past. The trouble was usually that some weird path would dive into evacuation with a GC worker, although the oom-scope was already held at an outer scope. It becomes really bad when locks are involved, e.g. the heap-lock, code-cache-lock and recently the per-nmethod locks. This is very deadlock-prone. The way out is to be very strict about where we place the oom-scopes. They should *only* be very close to SH::evacuate_object(), and they should *always* be the innermost scopes, inside any possible locks. Placement must be such that both conditions are rather obviously met. The biggest trouble here is Traversal GC: since it does *both* evacs and other stuff during traversal, it dives into LRB through various paths while GC threads holding the evac-scope. The solution is to only enter evac-scope very closely to SH::evacuate_object() at the expense of doing it quite often during traversal. I prefer to have a clear way to do it though, instead of the mess that we currently have. Bug: https://bugs.openjdk.java.net/browse/JDK-8236732 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8236732/webrev.00/ Testing: hotspot_gc_shenandoah, the specjvm/derby benchmark that troubled me with deadlock is now looking clean Ok? Roman From zgu at redhat.com Tue Jan 7 20:43:32 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 7 Jan 2020 15:43:32 -0500 Subject: RFR: 8236732: Shenandoah: Stricter placement for oom-evac scopes In-Reply-To: <2c07b4bc-70ce-5107-6c77-76a59c912ac6@redhat.com> References: <2c07b4bc-70ce-5107-6c77-76a59c912ac6@redhat.com> Message-ID: Okay. -Zhengyu On 1/7/20 3:26 PM, Roman Kennke wrote: > I'm currently looking at a deadlock with the derby benchmark which > involves oom-scopes and new concurrent-class-unloading. > > Currently, we have sprinkled OOM-evac scopes all over the place: > - In the main evac-loop (of course) > - In the LRB (of course) > - In various places > > The latter is very questionable and has repeatedly lead to problems in > the past. The trouble was usually that some weird path would dive into > evacuation with a GC worker, although the oom-scope was already held at > an outer scope. It becomes really bad when locks are involved, e.g. the > heap-lock, code-cache-lock and recently the per-nmethod locks. This is > very deadlock-prone. > > The way out is to be very strict about where we place the oom-scopes. > They should *only* be very close to SH::evacuate_object(), and they > should *always* be the innermost scopes, inside any possible locks. > Placement must be such that both conditions are rather obviously met. > > The biggest trouble here is Traversal GC: since it does *both* evacs and > other stuff during traversal, it dives into LRB through various paths > while GC threads holding the evac-scope. The solution is to only enter > evac-scope very closely to SH::evacuate_object() at the expense of doing > it quite often during traversal. I prefer to have a clear way to do it > though, instead of the mess that we currently have. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8236732 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8236732/webrev.00/ > > Testing: hotspot_gc_shenandoah, the specjvm/derby benchmark that > troubled me with deadlock is now looking clean > > Ok? > > Roman > From zgu at redhat.com Tue Jan 7 20:44:48 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 7 Jan 2020 15:44:48 -0500 Subject: RFR: 8236732: Shenandoah: Stricter placement for oom-evac scopes In-Reply-To: References: <2c07b4bc-70ce-5107-6c77-76a59c912ac6@redhat.com> Message-ID: <4e3cd3f1-6784-eec4-8584-82b400d40909@redhat.com> Need to update copyright years :-) -Zhengyu On 1/7/20 3:43 PM, Zhengyu Gu wrote: > Okay. > > -Zhengyu > > On 1/7/20 3:26 PM, Roman Kennke wrote: >> I'm currently looking at a deadlock with the derby benchmark which >> involves oom-scopes and new concurrent-class-unloading. >> >> Currently, we have sprinkled OOM-evac scopes all over the place: >> - In the main evac-loop (of course) >> - In the LRB (of course) >> - In various places >> >> The latter is very questionable and has repeatedly lead to problems in >> the past. The trouble was usually that some weird path would dive into >> evacuation with a GC worker, although the oom-scope was already held at >> an outer scope. It becomes really bad when locks are involved, e.g. the >> heap-lock, code-cache-lock and recently the per-nmethod locks. This is >> very deadlock-prone. >> >> The way out is to be very strict about where we place the oom-scopes. >> They should *only* be very close to SH::evacuate_object(), and they >> should *always* be the innermost scopes, inside any possible locks. >> Placement must be such that both conditions are rather obviously met. >> >> The biggest trouble here is Traversal GC: since it does *both* evacs and >> other stuff during traversal, it dives into LRB through various paths >> while GC threads holding the evac-scope. The solution is to only enter >> evac-scope very closely to SH::evacuate_object() at the expense of doing >> it quite often during traversal. I prefer to have a clear way to do it >> though, instead of the mess that we currently have. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8236732 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8236732/webrev.00/ >> >> Testing: hotspot_gc_shenandoah, the specjvm/derby benchmark that >> troubled me with deadlock is now looking clean >> >> Ok? >> >> Roman >> From kim.barrett at oracle.com Tue Jan 7 22:40:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 7 Jan 2020 17:40:38 -0500 Subject: [14] RFR (S): 8235934: gc/g1/TestGCLogMessages.java fails with 'DerivedPointerTable Update' found In-Reply-To: <0635ecc6-a4e9-0d34-d320-002ff148ca1a@oracle.com> References: <41e652e0-b843-e2da-c196-37b9b327d4aa@oracle.com> <0635ecc6-a4e9-0d34-d320-002ff148ca1a@oracle.com> Message-ID: <87F643E8-16AF-4E1C-8272-3D8CC0813938@oracle.com> > On Jan 7, 2020, at 5:41 AM, Thomas Schatzl wrote: > > http://cr.openjdk.java.net/~tschatzl/8235934/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8235934/webrev.1 (full) > > Tested locally. > > Thanks, > Thomas Looks good. From kim.barrett at oracle.com Tue Jan 7 23:22:08 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 7 Jan 2020 18:22:08 -0500 Subject: RFR[14]: 8235751: Assertion when triggering concurrent cycle during shutdown In-Reply-To: References: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> Message-ID: <55E3567C-1E40-4182-8243-80285583572D@oracle.com> > On Jan 7, 2020, at 6:47 AM, Thomas Schatzl wrote: > > Hi, > > On 31.12.19 04:01, Kim Barrett wrote: >> Please review this change to G1's handling of requests to initiate >> concurrent marking. >> When such a request is made during shutdown processing, after the cm >> thread has been stopped, the request to initiate concurrent marking is >> ignored. This could lead to an assertion failure for user requested >> GCs (System.gc and via agent) by a thread that has not yet been >> brought to a halt, because the possibility of such a request being >> ignored was missed when the assertion was recently added by JDK-8232588. >> We now report to the GC-invoking thread when initiation of concurrent >> marking has been suppressed because termination of the cm thread has >> been requested. In that case the GC invocation is considered finished. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8235751 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8235751/open.00/ >> Testing: >> mach5 tier1-5 >> Locally (linux-x64) reproduced fairly quickly the failure using the >> approach described in the CR; after applying the proposed chage, >> failed to reproduce. > > looks good. > > Thomas Thanks. From kim.barrett at oracle.com Tue Jan 7 23:22:18 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 7 Jan 2020 18:22:18 -0500 Subject: RFR[14]: 8235751: Assertion when triggering concurrent cycle during shutdown In-Reply-To: <23ec1be7-9619-d105-f05c-29603840839a@oracle.com> References: <695EF54D-675F-4162-8518-115CC9F63F8D@oracle.com> <23ec1be7-9619-d105-f05c-29603840839a@oracle.com> Message-ID: <7EFDC050-D672-4BF5-8009-EF95736829E2@oracle.com> > On Jan 7, 2020, at 4:10 AM, Stefan Johansson wrote: > > Hi Kim, > > On 2019-12-31 04:01, Kim Barrett wrote: >> Please review this change to G1's handling of requests to initiate >> concurrent marking. >> When such a request is made during shutdown processing, after the cm >> thread has been stopped, the request to initiate concurrent marking is >> ignored. This could lead to an assertion failure for user requested >> GCs (System.gc and via agent) by a thread that has not yet been >> brought to a halt, because the possibility of such a request being >> ignored was missed when the assertion was recently added by JDK-8232588. >> We now report to the GC-invoking thread when initiation of concurrent >> marking has been suppressed because termination of the cm thread has >> been requested. In that case the GC invocation is considered finished. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8235751 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8235751/open.00/ >> > > Looks good, > StefanJ > >> Testing: >> mach5 tier1-5 >> Locally (linux-x64) reproduced fairly quickly the failure using the >> approach described in the CR; after applying the proposed chage, >> failed to reproduce. Thanks. From stuart.monteith at linaro.org Tue Jan 7 23:34:37 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Tue, 7 Jan 2020 23:34:37 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC Message-ID: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Hello Zhengyu, et al, This is the current state of the nmethod barrier code I have for ZGC on aarch64. As I understand it, Zhengyu may have been working on this, and so this is my sharing it: http://cr.openjdk.java.net/~smonteith/nmethod/webrev.0/ The code has various bits for debugging, prototype level code, with comments and some notes interspersed throughout - it is not ready for merging. The approach I've taken for the nmethod barrier is to have the nmethod barrier that is emitted be implemented like so: __ adr(rscratch1, __ pc()); __ ldarw(rscratch2, rscratch1); __ ldrw(rscratch1, thread_disarmed_addr); __ cmpw(rscratch2, rscratch1); __ br(Assembler::EQ, continuation); __ mov(rscratch1, StubRoutines::aarch64::method_entry_barrier()); __ blr(rscratch1); __ bind(continuation); This code is patched up such that the ldarw is loading from a field I've added to nmethod "_nmethod_guard". There don't appear to be existing ways to emit a relocation (there aren't spare bits to do a small change) from an address in nmethod emitted code into the nmethod data structure. It is initialized to the instruction's current address and BarrierSetNMethod::disarm will detect this known value and fix it up, which occurs on initialization. Currently the deoptmise path is broken. By setting the environment variable "SRDM_forcedeopt", the deoptimisation can be provoked even when not needed - the x86 implementation is good with this change. The aarch64 code isn't working yet - I suspect I've followed the x86 code too closely, and my offsets are perhaps miscalculated - I may be pointing at the wrong frame, or I've neglected FP too much. BR, Stuart From zgu at redhat.com Wed Jan 8 01:22:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 7 Jan 2020 20:22:52 -0500 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Message-ID: Hi Stuart, Thanks for sharing. Actually, Andrew Haley (cc'd) is helping us on implementing nmethod entry barrier. Your patch is largely inline with what we have right now, but Andrew seems to have second thoughts :-) -Zhengyu On 1/7/20 6:34 PM, Stuart Monteith wrote: > Hello Zhengyu, et al, > This is the current state of the nmethod barrier code I have for ZGC on > aarch64. As I understand it, Zhengyu may have been working on this, and > so this is my sharing it: > > http://cr.openjdk.java.net/~smonteith/nmethod/webrev.0/ > > The code has various bits for debugging, prototype level code, with > comments and some notes interspersed throughout - it is not ready for > merging. > > The approach I've taken for the nmethod barrier is to have the nmethod > barrier that is emitted be implemented like so: > > __ adr(rscratch1, __ pc()); > __ ldarw(rscratch2, rscratch1); > __ ldrw(rscratch1, thread_disarmed_addr); > __ cmpw(rscratch2, rscratch1); > __ br(Assembler::EQ, continuation); > > __ mov(rscratch1, StubRoutines::aarch64::method_entry_barrier()); > __ blr(rscratch1); > > __ bind(continuation); > > > This code is patched up such that the ldarw is loading from a field I've > added to nmethod "_nmethod_guard". There don't appear to be existing > ways to emit a relocation (there aren't spare bits to do a small change) > from an address in nmethod emitted code into the nmethod data structure. > It is initialized to the instruction's current address and > BarrierSetNMethod::disarm will detect this known value and fix it up, > which occurs on initialization. > > Currently the deoptmise path is broken. By setting the environment > variable "SRDM_forcedeopt", the deoptimisation can be provoked even when > not needed - the x86 implementation is good with this change. The > aarch64 code isn't working yet - I suspect I've followed the x86 code > too closely, and my offsets are perhaps miscalculated - I may be > pointing at the wrong frame, or I've neglected FP too much. > > BR, > Stuart > From thomas.schatzl at oracle.com Wed Jan 8 09:47:32 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 8 Jan 2020 10:47:32 +0100 Subject: [14] RFR (S): 8235934: gc/g1/TestGCLogMessages.java fails with 'DerivedPointerTable Update' found In-Reply-To: <87F643E8-16AF-4E1C-8272-3D8CC0813938@oracle.com> References: <41e652e0-b843-e2da-c196-37b9b327d4aa@oracle.com> <0635ecc6-a4e9-0d34-d320-002ff148ca1a@oracle.com> <87F643E8-16AF-4E1C-8272-3D8CC0813938@oracle.com> Message-ID: <6795a285-6d99-9471-1c1d-30115ac57305@oracle.com> Hi Kim, On 07.01.20 23:40, Kim Barrett wrote: >> On Jan 7, 2020, at 5:41 AM, Thomas Schatzl wrote: >> >> http://cr.openjdk.java.net/~tschatzl/8235934/webrev.0_to_1 (diff) >> http://cr.openjdk.java.net/~tschatzl/8235934/webrev.1 (full) >> >> Tested locally. >> >> Thanks, >> Thomas > > Looks good. > thanks for your review. Thomas From aph at redhat.com Wed Jan 8 10:10:32 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 8 Jan 2020 10:10:32 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Message-ID: On 1/7/20 11:34 PM, Stuart Monteith wrote: > There don't appear to be existing > ways to emit a relocation (there aren't spare bits to do a small change) > from an address in nmethod emitted code into the nmethod data structure. Yeah, there is. See MacroAssembler::int_constant. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Jan 8 10:21:02 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 8 Jan 2020 10:21:02 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Message-ID: <8c8aa64b-a306-2801-fdb8-2942ea362b48@redhat.com> On 1/7/20 11:34 PM, Stuart Monteith wrote: > This code is patched up such that the ldarw is loading from a field I've > added to nmethod "_nmethod_guard". There don't appear to be existing > ways to emit a relocation (there aren't spare bits to do a small change) > from an address in nmethod emitted code into the nmethod data structure. > It is initialized to the instruction's current address and > BarrierSetNMethod::disarm will detect this known value and fix it up, > which occurs on initialization. > > Currently the deoptmise path is broken. By setting the environment > variable "SRDM_forcedeopt", the deoptimisation can be provoked even when > not needed - the x86 implementation is good with this change. The > aarch64 code isn't working yet - I suspect I've followed the x86 code > too closely, and my offsets are perhaps miscalculated - I may be > pointing at the wrong frame, or I've neglected FP too much. I'll integrate this with what I've got and return it to you. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Wed Jan 8 13:18:19 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 8 Jan 2020 08:18:19 -0500 Subject: [15] RFR 8228818: Shenandoah: Processing weak roots in concurrent phase when possible Message-ID: Please review this enhancement that moves some of weak root processing into concurrent phase whenever possible. When concurrent class unloading is enabled, the weak roots that backed by OopStorage can be processed in concurrent phase, as Shenandoah native LRB can properly resolve the object and hide dead oops from mutators. Bug: https://bugs.openjdk.java.net/browse/JDK-8228818 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) on x86_64 and x86_32 Linux specjvm on x86_64 Linux Thanks, -Zhengyu From thomas.schatzl at oracle.com Wed Jan 8 13:38:44 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 8 Jan 2020 14:38:44 +0100 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions Message-ID: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> Hi all, could I have reviews for this small cleanup/simplification that merges the open and closed archive region map into a single one? CR: https://bugs.openjdk.java.net/browse/JDK-8214277 Webrev: http://cr.openjdk.java.net/~tschatzl/8214277/webrev/ Testing: hs-tier1-5 (almost done, no issues), local gc/g1 jtreg Thanks, Thomas From stuart.monteith at linaro.org Wed Jan 8 14:23:27 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 8 Jan 2020 14:23:27 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Message-ID: Hi, I see there is LIR_Assembler::int_constant, which is only for C1, the equivalent is MacroAssembler::ldr_constant, which uses an InternalAddress. I had looked at doing that first, however I had an issue getting the address of the constants. The _consts CodeSection information exists in the CodeBuffer when the code is being generated, but I hadn't worked out after that point where the constant would be after the relocation, which is what we'd need when arming/disarming the guard, from which we have to navigate from the nmethod class. My assumption was that the nmethod would consist of the header,followed by the relocations, followed by the code section which would include the consts, and that my constant might be one of the first ones. I don't believe I could make that a consistent address. I chose to put the guard into the nmethod data structure itself, but that of course meant we had to check whether or not the guard needs initialized - although I was thinking we could add an additional initialization stage. It occurs to me now that we could pull out the address of the guard from the ADR instruction, as that will be relocated and we know its location, and could do that unconditionally. I'll revisit using the constants, as maybe the address of it is more deterministic than I first thought. Thanks, Stuart On Wed, 8 Jan 2020 at 10:10, Andrew Haley wrote: > > On 1/7/20 11:34 PM, Stuart Monteith wrote: > > There don't appear to be existing > > ways to emit a relocation (there aren't spare bits to do a small change) > > from an address in nmethod emitted code into the nmethod data structure. > > Yeah, there is. See MacroAssembler::int_constant. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From aph at redhat.com Wed Jan 8 15:37:22 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 8 Jan 2020 15:37:22 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> Message-ID: <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> On 1/8/20 2:23 PM, Stuart Monteith wrote: > I see there is LIR_Assembler::int_constant, which is only for C1, the > equivalent is MacroAssembler::ldr_constant, which uses an > InternalAddress. There is MacroAssembler::int_constant(n). It is there, and it returns an address that you can use with ADR and/or LDR . It won't work with a native method because they have no constant pool (int_constant() will return NULL) but I don't think you need barriers for native methods. (Um, perhaps you do, for synchronized ones? They have a reference to a class.) Anyway, this is your patch with a working (probably) deoptimize handler: http://cr.openjdk.java.net/~aph/aarch64-jdk-nmethod-barriers-3.patch -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Jan 8 16:39:25 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 8 Jan 2020 17:39:25 +0100 Subject: [15] RFR 8228818: Shenandoah: Processing weak roots in concurrent phase when possible In-Reply-To: References: Message-ID: Hi Zhengyu, src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp: +class ShenandoahEvacUpdateCleanupOopStorageRootsClosure : public BasicOopIterateClosure { Why can't this go in shenandoahHeap.cpp (only place where it's used)? src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp: + // Cleanup/Evacuate weak roots + if (heap->marking_context()->is_complete() && ShenandoahConcurrentRoots::should_do_concurrent_class_unloading()) { + heap->entry_weak_roots(); Are you sure that we only want to do cleanup when we do conc-class-unloading? Originally, we hoooked this up in the entry_roots(), why's that not good enough? src/hotspot/share/gc/shenandoah/shenandoahRootVerifier.cpp: What's that change? - return (_types & type) != 0; + return (_types & type) == type; Thanks, Roman > Please review this enhancement that moves some of weak root processing > into concurrent phase whenever possible. > > When concurrent class unloading is enabled, the weak roots that backed > by OopStorage can be processed in concurrent phase, as Shenandoah native > LRB can properly resolve the object and hide dead oops from mutators. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8228818 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? on x86_64 and x86_32 Linux > ? specjvm on x86_64 Linux > > Thanks, > > -Zhengyu > From zgu at redhat.com Wed Jan 8 17:21:08 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 8 Jan 2020 12:21:08 -0500 Subject: [15] RFR 8228818: Shenandoah: Processing weak roots in concurrent phase when possible In-Reply-To: References: Message-ID: <29be2bac-6192-237e-7234-23e23a2966ad@redhat.com> On 1/8/20 11:39 AM, Roman Kennke wrote: > Hi Zhengyu, > > src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp: > +class ShenandoahEvacUpdateCleanupOopStorageRootsClosure : public > BasicOopIterateClosure { > > Why can't this go in shenandoahHeap.cpp (only place where it's used)? Sure, updated: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.01/ > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp: > + // Cleanup/Evacuate weak roots > + if (heap->marking_context()->is_complete() && > ShenandoahConcurrentRoots::should_do_concurrent_class_unloading()) { > + heap->entry_weak_roots(); > > Are you sure that we only want to do cleanup when we do > conc-class-unloading? Originally, we hoooked this up in the > entry_roots(), why's that not good enough? Yes, otherwise, weak roots are still processed at final mark/init evac pause. I split into 2 phases, because I think it is logically simpler: when weak roots processing failed, degenerated GC simply re-executes related logic (parallel cleaning) and disarm nmethods, which is equivalent to STW version. > > src/hotspot/share/gc/shenandoah/shenandoahRootVerifier.cpp: > > What's that change? > - return (_types & type) != 0; > + return (_types & type) == type; > Because WeakRoots is the combination of SerialWeakRoots and ConcurrentWeakRoots now, when we test WeakRoots, expect both bits are set. Thanks, -Zhengyu > Thanks, > Roman > >> Please review this enhancement that moves some of weak root processing >> into concurrent phase whenever possible. >> >> When concurrent class unloading is enabled, the weak roots that backed >> by OopStorage can be processed in concurrent phase, as Shenandoah native >> LRB can properly resolve the object and hide dead oops from mutators. >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8228818 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.00/ >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) >> ? on x86_64 and x86_32 Linux >> ? specjvm on x86_64 Linux >> >> Thanks, >> >> -Zhengyu >> > From rkennke at redhat.com Wed Jan 8 21:25:01 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 8 Jan 2020 22:25:01 +0100 Subject: RFR: 8236815: Shenandoah: Fix weak roots in final Traversal GC phase Message-ID: <3dc6cf7d-bdb0-becd-4335-787ec418c001@redhat.com> We're not fixing up all weak roots in final-traversal. But we have to, because weak roots are not scanned+evacuated at init-traversal, and may thus keep dangling pointers that would leak out to the next cycle. This can lead to heap corruption, crashes, etc. Bug: https://bugs.openjdk.java.net/browse/JDK-8236815 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8236815/webrev.00/ Testing: several runs of hotspot_gc_shenandoah, which *sometimes* exposed the bug. I couldn't reproduce it. I suggest to give it more spins in CI. Can I please get a review? Thanks, Roman From zgu at redhat.com Wed Jan 8 21:31:02 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 8 Jan 2020 16:31:02 -0500 Subject: RFR: 8236815: Shenandoah: Fix weak roots in final Traversal GC phase In-Reply-To: <3dc6cf7d-bdb0-becd-4335-787ec418c001@redhat.com> References: <3dc6cf7d-bdb0-becd-4335-787ec418c001@redhat.com> Message-ID: <4531912b-4af2-3dcc-484e-bf13c422a0a6@redhat.com> Looks good. -Zhengyu On 1/8/20 4:25 PM, Roman Kennke wrote: > We're not fixing up all weak roots in final-traversal. But we have to, > because weak roots are not scanned+evacuated at init-traversal, and may > thus keep dangling pointers that would leak out to the next cycle. This > can lead to heap corruption, crashes, etc. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8236815 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8236815/webrev.00/ > > Testing: several runs of hotspot_gc_shenandoah, which *sometimes* > exposed the bug. I couldn't reproduce it. I suggest to give it more > spins in CI. > > Can I please get a review? > > Thanks, > Roman > From jianglizhou at google.com Wed Jan 8 23:41:32 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 8 Jan 2020 15:41:32 -0800 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> Message-ID: Hi Thomas, Looks good! Can we also remove '_open_archive_region_map' from g1Allocator.* as it's no longer needed? Best, Jiangli On Wed, Jan 8, 2020 at 5:39 AM Thomas Schatzl wrote: > > Hi all, > > could I have reviews for this small cleanup/simplification that > merges the open and closed archive region map into a single one? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8214277 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8214277/webrev/ > Testing: > hs-tier1-5 (almost done, no issues), local gc/g1 jtreg > > Thanks, > Thomas From kim.barrett at oracle.com Thu Jan 9 00:25:53 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 8 Jan 2020 19:25:53 -0500 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> Message-ID: <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> > On Jan 8, 2020, at 8:38 AM, Thomas Schatzl wrote: > > Hi all, > > could I have reviews for this small cleanup/simplification that merges the open and closed archive region map into a single one? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8214277 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8214277/webrev/ > Testing: > hs-tier1-5 (almost done, no issues), local gc/g1 jtreg > > Thanks, > Thomas ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Allocator.inline.hpp 151 inline void G1ArchiveAllocator::clear_range_archive(MemRegion range, bool open) { clear_range_archive no longer uses the open argument for anything other than logging. Is it worth keeping? ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Allocator.inline.hpp 186 return (archive_check_enabled() && 187 (_archive_region_map.get_by_address((HeapWord*)object) != G1ArchiveRegionMap::NoArchive)); The indentation of line 187 is confusing; the indentation suggests the open paren there is at the same nesting level as the one on the preceeding line directly above. The first open paren on line 186 and the associated close could just be removed. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Allocator.inline.hpp 162 // This is the out-of-line part of is_closed_archive_object test, done separately 163 // to avoid additional performance impact when the check is not enabled. Pre-existing: Given that it is in an inline definition, the accuracy of this comment seems questionable. ------------------------------------------------------------------------------ Jiangli already pointed out that _open_archive_region_map is no longer used. From zgu at redhat.com Thu Jan 9 01:41:48 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 8 Jan 2020 20:41:48 -0500 Subject: [15] RFR 8228818: Shenandoah: Processing weak roots in concurrent phase when possible In-Reply-To: <29be2bac-6192-237e-7234-23e23a2966ad@redhat.com> References: <29be2bac-6192-237e-7234-23e23a2966ad@redhat.com> Message-ID: <42a9f680-1a0d-eebe-e253-198aeb3a0a5e@redhat.com> On 1/8/20 12:21 PM, Zhengyu Gu wrote: > > > On 1/8/20 11:39 AM, Roman Kennke wrote: >> Hi Zhengyu, >> >> src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp: >> +class ShenandoahEvacUpdateCleanupOopStorageRootsClosure : public >> BasicOopIterateClosure { >> >> Why can't this go in shenandoahHeap.cpp (only place where it's used)? > > Sure,? updated: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.01/ > >> >> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp: >> +? // Cleanup/Evacuate weak roots >> +? if (heap->marking_context()->is_complete() && >> ShenandoahConcurrentRoots::should_do_concurrent_class_unloading()) { >> +??? heap->entry_weak_roots(); >> >> Are you sure that we only want to do cleanup when we do >> conc-class-unloading? Originally, we hoooked this up in the >> entry_roots(), why's that not good enough? > > Yes, otherwise, weak roots are still processed at final mark/init evac > pause. > > I split into 2 phases, because I think it is logically simpler: when > weak roots processing failed, degenerated GC simply re-executes related > logic (parallel cleaning) and disarm nmethods, which is equivalent to > STW version. Took another look, it does not seem to be a good idea, as it duplicates the work of update_roots(), that is called at the end of degenerated GC cycle to fix the roots. Merged weak_roots phase into concurrent roots phase, and removed weak_roots degenerated point, also simplified the patch. Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.02/ Test: Reran hotspot_gc_shenandoah (fastdebug and release) on x86_64 and x86_32 Linux. Thanks, -Zhengyu > >> >> src/hotspot/share/gc/shenandoah/shenandoahRootVerifier.cpp: >> >> What's that change? >> -? return (_types & type) != 0; >> +? return (_types & type) == type; >> > > Because WeakRoots is the combination of SerialWeakRoots and > ConcurrentWeakRoots now, when we test WeakRoots, expect both bits are set. > > > Thanks, > > -Zhengyu > > >> Thanks, >> Roman >> >>> Please review this enhancement that moves some of weak root processing >>> into concurrent phase whenever possible. >>> >>> When concurrent class unloading is enabled, the weak roots that backed >>> by OopStorage can be processed in concurrent phase, as Shenandoah native >>> LRB can properly resolve the object and hide dead oops from mutators. >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8228818 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.00/ >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) >>> ?? on x86_64 and x86_32 Linux >>> ?? specjvm on x86_64 Linux >>> >>> Thanks, >>> >>> -Zhengyu >>> >> From rkennke at redhat.com Thu Jan 9 08:44:19 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 9 Jan 2020 09:44:19 +0100 Subject: [15] RFR 8228818: Shenandoah: Processing weak roots in concurrent phase when possible In-Reply-To: <42a9f680-1a0d-eebe-e253-198aeb3a0a5e@redhat.com> References: <29be2bac-6192-237e-7234-23e23a2966ad@redhat.com> <42a9f680-1a0d-eebe-e253-198aeb3a0a5e@redhat.com> Message-ID: Hi Zhengyu, the latest patch looks good to me. Thanks, Roman > On 1/8/20 12:21 PM, Zhengyu Gu wrote: >> >> >> On 1/8/20 11:39 AM, Roman Kennke wrote: >>> Hi Zhengyu, >>> >>> src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp: >>> +class ShenandoahEvacUpdateCleanupOopStorageRootsClosure : public >>> BasicOopIterateClosure { >>> >>> Why can't this go in shenandoahHeap.cpp (only place where it's used)? >> >> Sure,? updated: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.01/ >> >>> >>> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp: >>> +? // Cleanup/Evacuate weak roots >>> +? if (heap->marking_context()->is_complete() && >>> ShenandoahConcurrentRoots::should_do_concurrent_class_unloading()) { >>> +??? heap->entry_weak_roots(); >>> >>> Are you sure that we only want to do cleanup when we do >>> conc-class-unloading? Originally, we hoooked this up in the >>> entry_roots(), why's that not good enough? >> >> Yes, otherwise, weak roots are still processed at final mark/init evac >> pause. >> >> I split into 2 phases, because I think it is logically simpler: when >> weak roots processing failed, degenerated GC simply re-executes >> related logic (parallel cleaning) and disarm nmethods, which is >> equivalent to STW version. > > Took another look, it does not seem to be a good idea, as it duplicates > the work of update_roots(), that is called at the end of degenerated GC > cycle to fix the roots. > > Merged weak_roots phase into concurrent roots phase, and removed > weak_roots degenerated point, also simplified the patch. > > Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.02/ > > Test: > ? Reran hotspot_gc_shenandoah (fastdebug and release) > ? on x86_64 and x86_32 Linux. > > Thanks, > > -Zhengyu > > >> >>> >>> src/hotspot/share/gc/shenandoah/shenandoahRootVerifier.cpp: >>> >>> What's that change? >>> -? return (_types & type) != 0; >>> +? return (_types & type) == type; >>> >> >> Because WeakRoots is the combination of SerialWeakRoots and >> ConcurrentWeakRoots now, when we test WeakRoots, expect both bits are >> set. >> >> >> Thanks, >> >> -Zhengyu >> >> >>> Thanks, >>> Roman >>> >>>> Please review this enhancement that moves some of weak root processing >>>> into concurrent phase whenever possible. >>>> >>>> When concurrent class unloading is enabled, the weak roots that backed >>>> by OopStorage can be processed in concurrent phase, as Shenandoah >>>> native >>>> LRB can properly resolve the object and hide dead oops from mutators. >>>> >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8228818 >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228818/webrev.00/ >>>> >>>> Test: >>>> ?? hotspot_gc_shenandoah (fastdebug and release) >>>> ?? on x86_64 and x86_32 Linux >>>> ?? specjvm on x86_64 Linux >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>>> >>> > From thomas.schatzl at oracle.com Thu Jan 9 11:37:54 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 9 Jan 2020 12:37:54 +0100 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> Message-ID: Hi Kim, Jiangli, thanks for your reviews. On 09.01.20 01:25, Kim Barrett wrote: >> On Jan 8, 2020, at 8:38 AM, Thomas Schatzl wrote: >> >> Hi all, >> >> could I have reviews for this small cleanup/simplification that merges the open and closed archive region map into a single one? >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8214277 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8214277/webrev/ >> Testing: >> hs-tier1-5 (almost done, no issues), local gc/g1 jtreg >> >> Thanks, >> Thomas > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > 151 inline void G1ArchiveAllocator::clear_range_archive(MemRegion range, bool open) { > > clear_range_archive no longer uses the open argument for anything > other than logging. Is it worth keeping? Removed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > 186 return (archive_check_enabled() && > 187 (_archive_region_map.get_by_address((HeapWord*)object) != G1ArchiveRegionMap::NoArchive)); > > The indentation of line 187 is confusing; the indentation suggests the > open paren there is at the same nesting level as the one on the > preceeding line directly above. The first open paren on line 186 and > the associated close could just be removed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > 162 // This is the out-of-line part of is_closed_archive_object test, done separately > 163 // to avoid additional performance impact when the check is not enabled. > > Pre-existing: Given that it is in an inline definition, the accuracy > of this comment seems questionable. Removed. > > ------------------------------------------------------------------------------ > > Jiangli already pointed out that _open_archive_region_map is no longer > used. > All fixed in http://cr.openjdk.java.net/~tschatzl/8214277/webrev.0_to_1 (diff) http://cr.openjdk.java.net/~tschatzl/8214277/webrev.1/ (full) Thanks, Thomas From rkennke at redhat.com Thu Jan 9 12:46:38 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 9 Jan 2020 13:46:38 +0100 Subject: RFR: 8236851: Shenandoah: More details in Traversal GC event messages Message-ID: We currently only print e.g. "Pause Init Traversal" in event messages for Traversal GC. We should also include information whether or not the cycle also does unload classes and/or process references, like we do for the normal mode. Bug: https://bugs.openjdk.java.net/browse/JDK-8236851 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8236851/webrev.00/ Testing: hotspot_gc_shenandoah, manual inspection of hs_err files Can I please get a review? Thanks, Roman From stuart.monteith at linaro.org Thu Jan 9 16:02:35 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Thu, 9 Jan 2020 16:02:35 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> Message-ID: Thank you Andrew, that compiles and runs without error - the deoptimize method is definitely being provoked. and continues without apparent problems. I've been trying to insert constants, and the issue you mention is tripped when we enter a native method wrapper. Eric can perhaps correct me, but I presume we might have to deoptimise a native method if it was overriding a JIT-compiled method and it is subsequently been unloaded. In x86 it is inserted here: http://hg.openjdk.java.net/jdk/jdk/file/6d23020e3da0/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#l2204 on aarch64 I added the change in our generate_native_wrapper. BR, Stuart On Wed, 8 Jan 2020 at 15:37, Andrew Haley wrote: > > On 1/8/20 2:23 PM, Stuart Monteith wrote: > > I see there is LIR_Assembler::int_constant, which is only for C1, the > > equivalent is MacroAssembler::ldr_constant, which uses an > > InternalAddress. > > There is MacroAssembler::int_constant(n). It is there, and it returns an > address that you can use with ADR and/or LDR . It won't work with a native > method because they have no constant pool (int_constant() will return NULL) > but I don't think you need barriers for native methods. > > (Um, perhaps you do, for synchronized ones? They have a reference to a class.) > > Anyway, this is your patch with a working (probably) deoptimize handler: > > http://cr.openjdk.java.net/~aph/aarch64-jdk-nmethod-barriers-3.patch > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From aph at redhat.com Thu Jan 9 16:14:31 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 9 Jan 2020 16:14:31 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> Message-ID: <45a22fc8-af0f-6632-3505-fe333a67145d@redhat.com> On 1/9/20 4:02 PM, Stuart Monteith wrote: > Thank you Andrew, that compiles and runs without error - the > deoptimize method is definitely being provoked. and continues without > apparent problems. > > I've been trying to insert constants, and the issue you mention is > tripped when we enter a native method wrapper. Eric can perhaps > correct me, but I presume we might have to deoptimise a native method > if it was overriding a JIT-compiled method and it is subsequently been > unloaded. OK, so we can stick with a field in nmethod. It is probably more economical on space anyway. You should be able to use adr(rscratch1, InternalAddress(nmethod field)); -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kim.barrett at oracle.com Thu Jan 9 17:43:08 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 9 Jan 2020 12:43:08 -0500 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> Message-ID: <680B79D8-6E3E-4BB1-96FD-6317718783FA@oracle.com> > On Jan 9, 2020, at 6:37 AM, Thomas Schatzl wrote: > All fixed in > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.1/ (full) > > Thanks, > Thomas Looks good. From thomas.schatzl at oracle.com Thu Jan 9 19:01:53 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 09 Jan 2020 20:01:53 +0100 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: <680B79D8-6E3E-4BB1-96FD-6317718783FA@oracle.com> References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> <680B79D8-6E3E-4BB1-96FD-6317718783FA@oracle.com> Message-ID: Hi, On Thu, 2020-01-09 at 12:43 -0500, Kim Barrett wrote: > > On Jan 9, 2020, at 6:37 AM, Thomas Schatzl < > > thomas.schatzl at oracle.com> wrote: > > All fixed in > > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.0_to_1 (diff) > > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.1/ (full) > > > > Thanks, > > Thomas > > Looks good. > thanks for your review. Thomas From jianglizhou at google.com Thu Jan 9 20:19:45 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Thu, 9 Jan 2020 12:19:45 -0800 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> Message-ID: Hi Thomas, The update looks good. Best regards, Jiangli On Thu, Jan 9, 2020 at 3:39 AM Thomas Schatzl wrote: > > Hi Kim, Jiangli, > > thanks for your reviews. > > On 09.01.20 01:25, Kim Barrett wrote: > >> On Jan 8, 2020, at 8:38 AM, Thomas Schatzl wrote: > >> > >> Hi all, > >> > >> could I have reviews for this small cleanup/simplification that merges the open and closed archive region map into a single one? > >> > >> CR: > >> https://bugs.openjdk.java.net/browse/JDK-8214277 > >> Webrev: > >> http://cr.openjdk.java.net/~tschatzl/8214277/webrev/ > >> Testing: > >> hs-tier1-5 (almost done, no issues), local gc/g1 jtreg > >> > >> Thanks, > >> Thomas > > > > ------------------------------------------------------------------------------ > > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > > 151 inline void G1ArchiveAllocator::clear_range_archive(MemRegion range, bool open) { > > > > clear_range_archive no longer uses the open argument for anything > > other than logging. Is it worth keeping? > > Removed. > > > > > ------------------------------------------------------------------------------ > > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > > 186 return (archive_check_enabled() && > > 187 (_archive_region_map.get_by_address((HeapWord*)object) != G1ArchiveRegionMap::NoArchive)); > > > > The indentation of line 187 is confusing; the indentation suggests the > > open paren there is at the same nesting level as the one on the > > preceeding line directly above. The first open paren on line 186 and > > the associated close could just be removed. > > > > ------------------------------------------------------------------------------ > > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > > 162 // This is the out-of-line part of is_closed_archive_object test, done separately > > 163 // to avoid additional performance impact when the check is not enabled. > > > > Pre-existing: Given that it is in an inline definition, the accuracy > > of this comment seems questionable. > > Removed. > > > > > ------------------------------------------------------------------------------ > > > > Jiangli already pointed out that _open_archive_region_map is no longer > > used. > > > > All fixed in > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8214277/webrev.1/ (full) > > Thanks, > Thomas From thomas.schatzl at oracle.com Thu Jan 9 20:59:02 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 09 Jan 2020 21:59:02 +0100 Subject: RFR (S): 8214277: Use merged G1ArchiveRegionMap for open and closed archive heap regions In-Reply-To: References: <180618f6-8fe4-22f0-dbc0-5f275c5b1d90@oracle.com> <1AB85763-9BEA-4A7A-A37E-DCB70F6C9D0E@oracle.com> Message-ID: <13588b7a3d2af571caeffcfe9425282a92215e8d.camel@oracle.com> Hi, On Thu, 2020-01-09 at 12:19 -0800, Jiangli Zhou wrote: > Hi Thomas, > > The update looks good. > thanks for your review. Passed hs-tier1-5 in the meantime. Pushed :) > Best regards, > Jiangli > Thanks, Thomas From shade at redhat.com Thu Jan 9 21:15:12 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 9 Jan 2020 22:15:12 +0100 Subject: RFR: 8236851: Shenandoah: More details in Traversal GC event messages In-Reply-To: References: Message-ID: On 1/9/20 1:46 PM, Roman Kennke wrote: > We currently only print e.g. "Pause Init Traversal" in event messages > for Traversal GC. We should also include information whether or not the > cycle also does unload classes and/or process references, like we do for > the normal mode. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8236851 > > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8236851/webrev.00/ Looks fine to me. -- Thanks, -Aleksey From erik.osterlund at oracle.com Thu Jan 9 22:35:12 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 9 Jan 2020 23:35:12 +0100 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> Message-ID: <40c3cc91-8b7c-9bd6-1ff5-933b8a7a3166@oracle.com> Hi Stuart and Andrew, Right, when it comes to native wrappers, we do inject entry barriers for that on x86. The main reason for that is that I am allergic to "special" nmethods that you have to remember work differently all the time. We have too many of them. The only nmethod that regrettably does not have entry barriers is the method handle intrinsic. That seems fine but I'm not quite happy about it. Other than that, we also do need the barriers for correctness. Last time I thought about that, I recall there were a few problematic hypothetical situations I wanted to avoid. For example, consider the following obscure race condition (suitable beverage while reading advised): 1. Load abstract class A with non-static method foo. 2. Load class B, inheriting from A, from a separate class loader, overriding foo with a native method (that gets a native wrapper). 3. JIT nmethod with a virtual call to A.foo. The compiler will with CHA decide that there is only a single concrete foo implementation in the system (B::foo), due to there being a single implementation of A, which turns out to be our native wrapper. When this happens an optimized virtual call is generated with a direct call emitted (originally pointing at a resolution stub for the very first call), but the holder oop of B (it's class loader) is not inserted to the oop section. Instead, an entry is added in the dependency context to keep track of this nmethod so the caller nmethod (calling the native wrapper) can get deoptimized if the unique callee for A assumption changes. 4. Call the JIT-compiled call of A.foo with an instance of B, resolve it and patch the direct call to the native wrapper (B.foo *verified* entry, due to being an optimized virtual call). 5. Release the reference to the class loader of B, and wait until the class loader dies, and hence B dies. 6. Before concurrent class unloading kicks in (concurrently) and walks dependency contexts of dead things to invalidate them (which would invalidate the caller nmethod), load a class C also inheriting from A and overriding a concrete implementation of foo. When loading that class, the dependency context walk for invalidating e.g. CHA inconsistencies skip over the is_unloading() nmethods (including the native wrapper), due to race conditions that ended up giving that responsibility to the concurrent GC thread (which has not gotten to it yet). 7. Reuse the same JIT-compiled virtual call of A.foo but pass in a new instance of C. The state of the callsite is now a direct call to B.foo, and it's about to get deoptimized, but isn't yet. But B.foo is_unloading() because B is dead, making the one oop of the native wrapper (the holder oop of B) dead, and hence the native wrapper is_unloading(). Now in this scenario, without an nmethod entry barrier, we can end up calling a dead method. The nmethod entry barrier guards that by enforcing the invariant that we can't enter dead nmethods. Hope this makes sense and helps understanding why the native wrapper ought to have an entry barrier. Thanks, /Erik On 2020-01-09 17:02, Stuart Monteith wrote: > Thank you Andrew, that compiles and runs without error - the > deoptimize method is definitely being provoked. and continues without > apparent problems. > > I've been trying to insert constants, and the issue you mention is > tripped when we enter a native method wrapper. Eric can perhaps > correct me, but I presume we might have to deoptimise a native method > if it was overriding a JIT-compiled method and it is subsequently been > unloaded. > > In x86 it is inserted here: > http://hg.openjdk.java.net/jdk/jdk/file/6d23020e3da0/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#l2204 > on aarch64 I added the change in our generate_native_wrapper. > > > BR, > Stuart > > > On Wed, 8 Jan 2020 at 15:37, Andrew Haley wrote: >> On 1/8/20 2:23 PM, Stuart Monteith wrote: >>> I see there is LIR_Assembler::int_constant, which is only for C1, the >>> equivalent is MacroAssembler::ldr_constant, which uses an >>> InternalAddress. >> There is MacroAssembler::int_constant(n). It is there, and it returns an >> address that you can use with ADR and/or LDR . It won't work with a native >> method because they have no constant pool (int_constant() will return NULL) >> but I don't think you need barriers for native methods. >> >> (Um, perhaps you do, for synchronized ones? They have a reference to a class.) >> >> Anyway, this is your patch with a working (probably) deoptimize handler: >> >> http://cr.openjdk.java.net/~aph/aarch64-jdk-nmethod-barriers-3.patch >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> From zgu at redhat.com Fri Jan 10 12:01:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 10 Jan 2020 07:01:52 -0500 Subject: [14] RFR 8236902: Shenandoah: Missing string dedup roots in all root scanner Message-ID: String dedup roots are missing in all roots scanner. The problem was discovered while running TestHeapDump.java test with StringDeduplication enabled, let's add a test there to make sure. Bug: https://bugs.openjdk.java.net/browse/JDK-8236902 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236902/webrev.00/index.html Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From rkennke at redhat.com Fri Jan 10 12:04:56 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 10 Jan 2020 13:04:56 +0100 Subject: [14] RFR 8236902: Shenandoah: Missing string dedup roots in all root scanner In-Reply-To: References: Message-ID: <903438b8-87c6-a6fd-5ff0-8e60090228d5@redhat.com> Hi Zhengyu, the change looks good! Thanks! Roman > String dedup roots are missing in all roots scanner. > > The problem was discovered while running TestHeapDump.java test with > StringDeduplication enabled, let's add a test there to make sure. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236902 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236902/webrev.00/index.html > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > > Thanks, > > -Zhengyu > From zgu at redhat.com Fri Jan 10 14:48:16 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 10 Jan 2020 09:48:16 -0500 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters Message-ID: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> Hi, Please review this small change that uses atomic operations to update StringDedupTable's entries and entries_removed counter. This is *not* a correctness fix or performance enhancement, but for Shenandoah GC to move StringDedupTable cleanup task into concurrent phase, while holding StringDedupTable_lock. Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html Test: hotspot_gc (fastdebug and release) on x86_64 Linux Submit test in progress. Thanks, -Zhengyu From stuart.monteith at linaro.org Fri Jan 10 16:56:16 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Fri, 10 Jan 2020 16:56:16 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <45a22fc8-af0f-6632-3505-fe333a67145d@redhat.com> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> <45a22fc8-af0f-6632-3505-fe333a67145d@redhat.com> Message-ID: Hello, Something like "adr(rscratch1, InternalAddress(nmethod field))" has been suggested before. My problem has been finding what the address of the nmethod fields would be without knowing where the instruction is relative to them before the nmethod has been allocated. Relocations can be performed between the sections - constants, instructions and stubs - with section_word_Relocation instead of internal_word_Relocation. However, there doesn't appear to be a concept for the CodeBlob header during code emissions. If we have a method here: total in heap [0x0000ffff68e0b010,0x0000ffff68e0b410] = 1024 relocation [0x0000ffff68e0b190,0x0000ffff68e0b1a8] = 24 main code [0x0000ffff68e0b1c0,0x0000ffff68e0b300] = 320 stub code [0x0000ffff68e0b300,0x0000ffff68e0b3a0] = 160 metadata [0x0000ffff68e0b3a0,0x0000ffff68e0b3b8] = 24 scopes data [0x0000ffff68e0b3b8,0x0000ffff68e0b3c8] = 16 scopes pcs [0x0000ffff68e0b3c8,0x0000ffff68e0b408] = 64 dependencies [0x0000ffff68e0b408,0x0000ffff68e0b410] = 8 The nmethod structure is at 0x0000ffff68e0b010. Between that address and the main code is the relocation section, which we don't know the size of during instruction emission into the CodeBuffer. It appears that to relocate references to the nmethod structure from the code section before the nmethod is constructed would require its own relocation of sorts. I'm looking at adding a HEADER CodeSection that would allow relocation of entries into the nmethod/CodeBlob header. There is no guarantee our CodeBuffer has a CodeBlob during instruction emission - I've not looked to see whether that would be useful. BR, Stuart On Thu, 9 Jan 2020 at 16:14, Andrew Haley wrote: > > On 1/9/20 4:02 PM, Stuart Monteith wrote: > > Thank you Andrew, that compiles and runs without error - the > > deoptimize method is definitely being provoked. and continues without > > apparent problems. > > > > I've been trying to insert constants, and the issue you mention is > > tripped when we enter a native method wrapper. Eric can perhaps > > correct me, but I presume we might have to deoptimise a native method > > if it was overriding a JIT-compiled method and it is subsequently been > > unloaded. > > OK, so we can stick with a field in nmethod. It is probably more economical > on space anyway. > > You should be able to use adr(rscratch1, InternalAddress(nmethod field)); > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From rkennke at redhat.com Mon Jan 13 14:06:44 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 13 Jan 2020 15:06:44 +0100 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> Message-ID: OK. Thanks, Roman > > Please review this small change that uses atomic operations to update > StringDedupTable's entries and entries_removed counter. > > This is *not* a correctness fix or performance enhancement, but for > Shenandoah GC to move StringDedupTable cleanup task into concurrent > phase, while holding StringDedupTable_lock. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html > > Test: > ? hotspot_gc (fastdebug and release) on x86_64 Linux > ? Submit test in progress. > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 13 14:27:22 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 13 Jan 2020 09:27:22 -0500 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> Message-ID: Thanks, Roman. -Zhengyu On 1/13/20 9:06 AM, Roman Kennke wrote: > OK. > > Thanks, > Roman > > >> >> Please review this small change that uses atomic operations to update >> StringDedupTable's entries and entries_removed counter. >> >> This is *not* a correctness fix or performance enhancement, but for >> Shenandoah GC to move StringDedupTable cleanup task into concurrent >> phase, while holding StringDedupTable_lock. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html >> >> Test: >> ? hotspot_gc (fastdebug and release) on x86_64 Linux >> ? Submit test in progress. >> >> Thanks, >> >> -Zhengyu >> > From aph at redhat.com Mon Jan 13 14:59:59 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 13 Jan 2020 14:59:59 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> <45a22fc8-af0f-6632-3505-fe333a67145d@redhat.com> Message-ID: On 1/10/20 4:56 PM, Stuart Monteith wrote: > Something like "adr(rscratch1, InternalAddress(nmethod field))" has > been suggested before. My problem has been finding what the address of > the nmethod fields would be without knowing where the instruction is > relative to them before the nmethod has been allocated. Relocations > can be performed between the sections - constants, instructions and > stubs - with section_word_Relocation instead of > internal_word_Relocation. However, there doesn't appear to be a > concept for the CodeBlob header during code emissions. OK, I see.. > The nmethod structure is at 0x0000ffff68e0b010. Between that address > and the main code is the relocation section, which we don't know the > size of during instruction emission into the CodeBuffer. > It appears that to relocate references to the nmethod structure from > the code section before the nmethod is constructed would require its > own relocation of sorts. I'm looking at adding a HEADER CodeSection > that would allow relocation of entries into the nmethod/CodeBlob > header. We don't need another section. I think there are only two problems with using the constant section. There is a large (and probably pointless) alignment gap so that even a single-word constant takes up a lot of space. Also, native methods don't have constant sections, but that should be easy to fix. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Jan 13 15:01:41 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 13 Jan 2020 15:01:41 +0000 Subject: aarch64: Concurrent class unloading, nmethod barriers, ZGC In-Reply-To: <40c3cc91-8b7c-9bd6-1ff5-933b8a7a3166@oracle.com> References: <65e96ab7-3625-d5be-e5e0-be66c3137c8b@linaro.org> <69b38972-8061-b706-befe-28d49af42fe7@redhat.com> <40c3cc91-8b7c-9bd6-1ff5-933b8a7a3166@oracle.com> Message-ID: <1bad4ee7-93c3-041d-d805-fa8c0c76d3e2@redhat.com> On 1/9/20 10:35 PM, Erik ?sterlund wrote: > Now in this scenario, without an nmethod entry barrier, we can end up > calling a dead method. The nmethod entry barrier guards that by > enforcing the invariant that we can't enter dead nmethods. Wow. :-) Thank you for that. it's useful to have some discussion of this permanently recorded online for posterity, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Mon Jan 13 15:11:02 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 13 Jan 2020 10:11:02 -0500 Subject: [15] RFR(T) 8237017: Shenandoah: Remove racy assertion Message-ID: During concurrent weak root processing, it tries to CAS in NULL if the oop is dead, then asserts that the slot is indeed NULL. The assertion is racy, because there can be another thread to release the slot and then reuse it (that's why it uses CAS in the first place), that can result assertion to fail. Bug: https://bugs.openjdk.java.net/browse/JDK-8237017 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237017/webrev.00/index.html Test: hotspot_gc_shenandoah fastdebug on x86_64 Linux Thanks, -Zhengyu From per.liden at oracle.com Mon Jan 13 15:12:51 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 13 Jan 2020 16:12:51 +0100 Subject: RFR: 8236153: ZGC: gc/z/TestUncommit.java fails with java.lang.Exception: Uncommitted too fast Message-ID: <24c41e59-3e2c-9284-489a-af487e6cccc0@oracle.com> The test gc/z/TestUncommit.java fails now and then on Windows when using -Xcomp. This test can fail if it's severely starved on CPU, as it will cause the timing to be off. The logs confirms that the test took an unusually long time to execute, suggesting it was starved on CPU. This only happens in test tiers using -Xcomp, which is likely causing the unusually high load. This patch disables this test when using -Xcomp. I've enabled some GC logging, which should helpful if this test ever fails again. Bug: https://bugs.openjdk.java.net/browse/JDK-8236153 Webrev: http://cr.openjdk.java.net/~pliden/8236153/webrev.0 /Per From thomas.schatzl at oracle.com Mon Jan 13 15:15:09 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 13 Jan 2020 16:15:09 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> Message-ID: <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> Hi Liang, thanks for your contribution! I looked through the change a bit and have a few comments. What I noticed quickly after initial browsing through it is that this change implements three different features: 1) moving the memory uncommit into the concurrent phase 2) uncommit at the end of (almost) every GC 3) SoftMaxHeapSize These should be split across three separate changes (I already filed JDK-8236926 last week). No particular order I think, but the concurrent uncommit changes are probably hardest and will probably take most time. Some additional initial comments: - in G1Collectedheap::check_soft_max_heap_size_changed(), instead of the check for "AllocateOldGenAt != NULL" we probably want to ask the HeapRegionManager directly about whether it supports this. Also print some log message that it failed. Even on success, print a log message :) - in that same method, I recommend first doing the alignment adjustment (which probably needs to be done for that suggested soft_max_capacity() method below too) and then check if it changed. That saves the repeated != _prev_soft_max_heap_size check. Actually, just using the suggested soft_max_heap_size() method should be fine. - changes in G1CollectedHeap::resize_heap_if_necessary: please make the method to always use the soft max heap size as I do not understand why you would not do that. I recommend adding a "soft_max_capacity()" method in G1CollectedHeap, and let that return MIN2(align_up(SoftMaxHeapSize, heap_alignment), max_capacity()). There are a few places that check SoftMaxHeapSize validity (e.g. SoftMaxHeapSize <= capacity()), they could probably all be removed then. - doing timing: you might have noticed, we are currently transitioning to use Ticks/Tickspan for points in time and durations at least for the calculation, so in any new code please avoid using os::elapsedTime(). Use: Ticks start = Ticks::now(); // timed code phase_times()->record....((Ticks::now() - start).seconds() * 1000.0); instead. - in G1CollectedHeap::shrink_heap_after_young_collection() I would prefer a structure like in expand_heap_after_collection, i.e.: size_t shrink_bytes = _heap_sizing_policy->shrink_amount(); if (shrink_bytes > 0) { // do actual preparation for shrinking } and put all that logic determining the amount to shrink in that shrink_amount() method. - concurrent uncommit: - as mentioned, please split the related changes out from the other changes. This change is hard enough to get right as is. - I would really prefer if we did not need to introduce another helper thread for uncommitting memory. Did you try using the G1YoungRemSetSamplingThread? I understand that uncommit might then delay young gen sampling, but I do not expect these events to occur all the time (but I have no reference here). In the first implementation we could have another thread if others do not object, but every additional thread takes some time to startup and teardown, and memory for at least one stack page. - please move the change in G1CollectedHeap::abort_concurrent_cycle() into a separate method - waiting for completion of the concurrent uncommit and the concurrent marking are completely different concerns. - I admit I haven't looked at all cases in detail, but the split in is_available() and is_unavailable_for_allocation() in HeapRegionManager seems incomplete and unncessary. Particularly because of bad naming, as the documentation for is_available() says it's actually is_available[_for_allocation]. Disregarding the negation, these two look equivalent with the problem that !is_available() != is_unavailable..., which is really bad style. I have not found a case where it is harmful to not consider the _concurrent_resizing_map in is_available(). The split the state of a region between two bitmaps in HeapRegionManager (the available_map and the _concurrent_resizing_map) may be susceptible to tricky races. Please consider changing this to a real "state" as in "Available -> Removing -> Unavailable". This would make the code easier to read too. (And in the future "Adding" if required). - it should be possible to disable concurrent uncommit/resize via an experimental flag. Also there should be no concurrent resize thread if the Heapregionmanager does not support it. G1 couold immediately do the heap change in that case. The reason for this flag is to allow users too disable this if they experience problems. - not sure about why some methods in HeapRegionManager have "resizing" in their method name. As far as I can tell, the change only allows concurrent uncommit. Maybe use the above "remove" for regions instead of "uncommit" regions. Background: The code and comments in HeapRegionManager are aware and fairly consistent (I hope) to not use the wording commit/uncommit for operations on HeapRegions. Only operations on memory pages should use committed/uncommit. The naming in the added methods does not respect that. - some of the methods (e.g. to find free regions) should inform the caller that there are to-be-removed regions to maybe retry after waiting for completion of that thread to avoid unexpected OOM. - I have a feeling that if the concurrent uncommit thread worked on pages, not regions, the code would be easier to understand. It would also solve the issue you asked about with the G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region numbers anyway for logging, but otoh the logging could be done asynchroniusly. - s/parallely/concurrently a few times - there is a bug in the synchronization of the concurrent uncommit thread: it seems possible that the uncommit thread is still working (iterating over the list of regions to uncommit) while a completed (young) GC may add new regions to that list as young gcs do not wait for completion of the uncommit thread. - the concurrently uncommitted regions only become available for commit at the next gc, which seems very long. Why not make them available for commit "immediately"? Related to that is the use of par_set/clear_bit in e.g. the available bitmap: since all par/clear actions are asserted to be in the vm thread at a safepoint, there does not seem to be a need for using the parallel variants of set/clear bit (if keeping the current mechanism). - please document the supposed interactions and assumptions like in the above two paragraphs between the "resize" thread and the other threads and safepoints. - please use the existing HeapRegionManager::shrink_by() method+infrastructure for passing a shrink request to the HRM, either immediately shrinking the heap or deferring for later shrinking (probably controlled by a flag) instead of adding new methods for the same purpose (with mostly the same contents). E.g. there is a lot of code duplication in the new code in HeapRegionManager, particularly the combination of HeapRegionManager::concurrent_uncommit_regions_memory, HRM::synchronize_uncommit_regions_memory and HRM::uncommit_regions could probably be cut to almost 1/3rd. On 13.01.20 12:45, Thomas Schatzl wrote: > Hi Liang, > > On 07.01.20 17:33, Liang Mao wrote: >> Hi Thomas, >> >> As we previously discussed, I use the concurrent heap uncommit/commit >> mechanism to implement the SoftMaxHeapSize for G1. It is also for >> thfurther implementation of >> G1ElasticHeap for ergonomic >> change of heap size. In the previous 8u implementation, we had some >> limitations which are all >> removed now in this patch. The concurrent uncommit/commit can also >> work with some senarios for >> immediate heap expansion. >> >> Here is the webrev link: >> http://cr.openjdk.java.net/~luchsh/8236073.webrev/ >> >> We still have some questions. >> 1. Does the SoftMaxHeapSize limitation need to consider the GC time >> ratio as in >> expand_heap_after_young_collection? Now we haven't put the logic in yet. I am not completely clear what you are asking about, but the gc time ratio only affects the current "optimal" heap size which is bounded by SoftMaxHeapsize/max_capacity. >> 2. The concurrent uncommit/commit can only work for >> G1RegionsLargerThanCommitSizeMapper but not >> G1RegionsSmallerThanCommitSizeMapper which might need some locks to >> ensure the multi-thread >> synchronization issue( heap may expand?immediately). I think bringing >> the lock synchronization >> may not be worthy for the little gain. Another idea is can we just not >> uncommit the pages of >> auxiliary data if in?G1RegionsSmallerThanCommitSizeMapper? Heap >> regions should not be >> G1RegionsSmallerThanCommitSizeMapper most of time I guess... >> >> Looking forward to your advice:) Thanks, Thomas From rkennke at redhat.com Mon Jan 13 16:24:49 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 13 Jan 2020 17:24:49 +0100 Subject: [15] RFR(T) 8237017: Shenandoah: Remove racy assertion In-Reply-To: References: Message-ID: Yes please remove that. Thanks, Roman > During concurrent weak root processing, it tries to CAS in NULL if the > oop is dead, then asserts that the slot is indeed NULL. > > The assertion is racy, because there can be another thread to release > the slot and then reuse it (that's why it uses CAS in the first place), > that can result assertion to fail. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237017 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237017/webrev.00/index.html > > Test: > ? hotspot_gc_shenandoah fastdebug on x86_64 Linux > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 13 18:18:57 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 13 Jan 2020 13:18:57 -0500 Subject: [14] RFR 8237038: Shenandoah: Reduce thread pool size in TestEvilSyncBug.java test Message-ID: Please review this small patch to reduce thread pool size in TestEvilSyncBug.java test. I have observed problems with the test on many core system, including crashes on arm server with 48 cores when running 4 concurrent test jobs. Bug: https://bugs.openjdk.java.net/browse/JDK-8237038 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237038/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) with 4 concurrent test jobs on 48 cores arm machine. Thanks, -Zhengyu From shade at redhat.com Mon Jan 13 18:24:39 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 13 Jan 2020 19:24:39 +0100 Subject: [14] RFR 8237038: Shenandoah: Reduce thread pool size in TestEvilSyncBug.java test In-Reply-To: References: Message-ID: <9750a26b-6428-06b1-331c-50c9eea1c8b0@redhat.com> On 1/13/20 7:18 PM, Zhengyu Gu wrote: > Please review this small patch to reduce thread pool size in > TestEvilSyncBug.java test. > > I have observed problems with the test on many core system, including > crashes on arm server with 48 cores when running 4 concurrent test jobs. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237038 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237038/webrev.00/ Looks fine. Suggestion to name this thing "numJobs" and shorter comment (not tested): // Use 1/4 of available processors to avoid over-saturation. int numJobs = Math.max(1, Runtime.getRuntime().availableProcessors() / 4); ExecutorService pool = Executors.newFixedThreadPool(numJobs); -- Thanks, -Aleksey From zgu at redhat.com Mon Jan 13 18:54:15 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 13 Jan 2020 13:54:15 -0500 Subject: [14] RFR 8237038: Shenandoah: Reduce thread pool size in TestEvilSyncBug.java test In-Reply-To: <9750a26b-6428-06b1-331c-50c9eea1c8b0@redhat.com> References: <9750a26b-6428-06b1-331c-50c9eea1c8b0@redhat.com> Message-ID: <88df9400-e14e-9d15-3bf8-7b565d65019b@redhat.com> On 1/13/20 1:24 PM, Aleksey Shipilev wrote: > On 1/13/20 7:18 PM, Zhengyu Gu wrote: >> Please review this small patch to reduce thread pool size in >> TestEvilSyncBug.java test. >> >> I have observed problems with the test on many core system, including >> crashes on arm server with 48 cores when running 4 concurrent test jobs. >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237038 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237038/webrev.00/ > > Looks fine. > > Suggestion to name this thing "numJobs" and shorter comment (not tested): > > // Use 1/4 of available processors to avoid over-saturation. > int numJobs = Math.max(1, Runtime.getRuntime().availableProcessors() / 4); > ExecutorService pool = Executors.newFixedThreadPool(numJobs); > Updated as you suggested and pushed. Thanks, -Zhengyu > From maoliang.ml at alibaba-inc.com Tue Jan 14 09:07:50 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 14 Jan 2020 17:07:50 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com>, <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> Message-ID: <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> Hi Thomas, Thank you for the detailed comments! Most of suggestions I will follow to do the modification. And I still have some questions: >> 1. Does the SoftMaxHeapSize limitation need to consider the GC time >> ratio as in >> expand_heap_after_young_collection? Now we haven't put the logic in yet. > I am not completely clear what you are asking about, but the gc time > ratio only affects the current "optimal" heap size which is bounded by > SoftMaxHeapsize/max_capacity. The decision to shrink to SoftMaxHeapSize in this patch is based on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which counts "used" + "reserve" + "young". We will change it to _heap_sizing_policy->shrink_amount(); as you commented. I'm not considering the GC time ratio as a factor to determine whether the heap can be shinked to SoftMaxHeapSize. > - changes in G1CollectedHeap::resize_heap_if_necessary: please make the > method to always use the soft max heap size as I do not understand why > you would not do that. Do you think we need to apply the logic "can_shrink_heap_size_to" inside resize_heap_if_necessary to determine whether to make soft max size as limit? > - there is a bug in the synchronization of the concurrent uncommit > thread: it seems possible that the uncommit thread is still working > (iterating over the list of regions to uncommit) while a completed > (young) GC may add new regions to that list as young gcs do not wait for > completion of the uncommit thread. Uncommit thread could be working parallelly with VMThread but VMThread will not add regions to the concurrent_resizing_list if concurrent resizing thread is in "working" state. > Related to that is the use of par_set/clear_bit in e.g. the available > bitmap: since all par/clear actions are asserted to be in the vm thread > at a safepoint, there does not seem to be a need for using the parallel > variants of set/clear bit (if keeping the current mechanism). For above reason that concurrent uncommit can run parallely with VMThread, the bit set/clear in vm thread at safepint have to be parallel. > - I have a feeling that if the concurrent uncommit thread worked on > pages, not regions, the code would be easier to understand. It would > also solve the issue you asked about with the > G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region > numbers anyway for logging, but otoh the logging could be done > asynchroniusly. I don't quite understand this part... For the G1RegionsSmallerThanCommitSizeMapper, a page can be simultaneously requested to commit in VMThread to expand heap and uncommit in concurrent thread to shrink heap. Looks like lowering uncommit work to page level couldn't help this... For the features you listed below, 1) moving the memory uncommit into the concurrent phase 2) uncommit at the end of (almost) every GC 3) SoftMaxHeapSize Since most of code is for the concurrent framework, do you think 2) and 3) can be together and implemented first? (The uncommit will happen immediately) Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 13 (Mon.) 23:15 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Liang, thanks for your contribution! I looked through the change a bit and have a few comments. What I noticed quickly after initial browsing through it is that this change implements three different features: 1) moving the memory uncommit into the concurrent phase 2) uncommit at the end of (almost) every GC 3) SoftMaxHeapSize These should be split across three separate changes (I already filed JDK-8236926 last week). No particular order I think, but the concurrent uncommit changes are probably hardest and will probably take most time. Some additional initial comments: - in G1Collectedheap::check_soft_max_heap_size_changed(), instead of the check for "AllocateOldGenAt != NULL" we probably want to ask the HeapRegionManager directly about whether it supports this. Also print some log message that it failed. Even on success, print a log message :) - in that same method, I recommend first doing the alignment adjustment (which probably needs to be done for that suggested soft_max_capacity() method below too) and then check if it changed. That saves the repeated != _prev_soft_max_heap_size check. Actually, just using the suggested soft_max_heap_size() method should be fine. - changes in G1CollectedHeap::resize_heap_if_necessary: please make the method to always use the soft max heap size as I do not understand why you would not do that. I recommend adding a "soft_max_capacity()" method in G1CollectedHeap, and let that return MIN2(align_up(SoftMaxHeapSize, heap_alignment), max_capacity()). There are a few places that check SoftMaxHeapSize validity (e.g. SoftMaxHeapSize <= capacity()), they could probably all be removed then. - doing timing: you might have noticed, we are currently transitioning to use Ticks/Tickspan for points in time and durations at least for the calculation, so in any new code please avoid using os::elapsedTime(). Use: Ticks start = Ticks::now(); // timed code phase_times()->record....((Ticks::now() - start).seconds() * 1000.0); instead. - in G1CollectedHeap::shrink_heap_after_young_collection() I would prefer a structure like in expand_heap_after_collection, i.e.: size_t shrink_bytes = _heap_sizing_policy->shrink_amount(); if (shrink_bytes > 0) { // do actual preparation for shrinking } and put all that logic determining the amount to shrink in that shrink_amount() method. - concurrent uncommit: - as mentioned, please split the related changes out from the other changes. This change is hard enough to get right as is. - I would really prefer if we did not need to introduce another helper thread for uncommitting memory. Did you try using the G1YoungRemSetSamplingThread? I understand that uncommit might then delay young gen sampling, but I do not expect these events to occur all the time (but I have no reference here). In the first implementation we could have another thread if others do not object, but every additional thread takes some time to startup and teardown, and memory for at least one stack page. - please move the change in G1CollectedHeap::abort_concurrent_cycle() into a separate method - waiting for completion of the concurrent uncommit and the concurrent marking are completely different concerns. - I admit I haven't looked at all cases in detail, but the split in is_available() and is_unavailable_for_allocation() in HeapRegionManager seems incomplete and unncessary. Particularly because of bad naming, as the documentation for is_available() says it's actually is_available[_for_allocation]. Disregarding the negation, these two look equivalent with the problem that !is_available() != is_unavailable..., which is really bad style. I have not found a case where it is harmful to not consider the _concurrent_resizing_map in is_available(). The split the state of a region between two bitmaps in HeapRegionManager (the available_map and the _concurrent_resizing_map) may be susceptible to tricky races. Please consider changing this to a real "state" as in "Available -> Removing -> Unavailable". This would make the code easier to read too. (And in the future "Adding" if required). - it should be possible to disable concurrent uncommit/resize via an experimental flag. Also there should be no concurrent resize thread if the Heapregionmanager does not support it. G1 couold immediately do the heap change in that case. The reason for this flag is to allow users too disable this if they experience problems. - not sure about why some methods in HeapRegionManager have "resizing" in their method name. As far as I can tell, the change only allows concurrent uncommit. Maybe use the above "remove" for regions instead of "uncommit" regions. Background: The code and comments in HeapRegionManager are aware and fairly consistent (I hope) to not use the wording commit/uncommit for operations on HeapRegions. Only operations on memory pages should use committed/uncommit. The naming in the added methods does not respect that. - some of the methods (e.g. to find free regions) should inform the caller that there are to-be-removed regions to maybe retry after waiting for completion of that thread to avoid unexpected OOM. - I have a feeling that if the concurrent uncommit thread worked on pages, not regions, the code would be easier to understand. It would also solve the issue you asked about with the G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region numbers anyway for logging, but otoh the logging could be done asynchroniusly. - s/parallely/concurrently a few times - there is a bug in the synchronization of the concurrent uncommit thread: it seems possible that the uncommit thread is still working (iterating over the list of regions to uncommit) while a completed (young) GC may add new regions to that list as young gcs do not wait for completion of the uncommit thread. - the concurrently uncommitted regions only become available for commit at the next gc, which seems very long. Why not make them available for commit "immediately"? Related to that is the use of par_set/clear_bit in e.g. the available bitmap: since all par/clear actions are asserted to be in the vm thread at a safepoint, there does not seem to be a need for using the parallel variants of set/clear bit (if keeping the current mechanism). - please document the supposed interactions and assumptions like in the above two paragraphs between the "resize" thread and the other threads and safepoints. - please use the existing HeapRegionManager::shrink_by() method+infrastructure for passing a shrink request to the HRM, either immediately shrinking the heap or deferring for later shrinking (probably controlled by a flag) instead of adding new methods for the same purpose (with mostly the same contents). E.g. there is a lot of code duplication in the new code in HeapRegionManager, particularly the combination of HeapRegionManager::concurrent_uncommit_regions_memory, HRM::synchronize_uncommit_regions_memory and HRM::uncommit_regions could probably be cut to almost 1/3rd. On 13.01.20 12:45, Thomas Schatzl wrote: > Hi Liang, > > On 07.01.20 17:33, Liang Mao wrote: >> Hi Thomas, >> >> As we previously discussed, I use the concurrent heap uncommit/commit >> mechanism to implement the SoftMaxHeapSize for G1. It is also for >> thfurther implementation of >> G1ElasticHeap for ergonomic >> change of heap size. In the previous 8u implementation, we had some >> limitations which are all >> removed now in this patch. The concurrent uncommit/commit can also >> work with some senarios for >> immediate heap expansion. >> >> Here is the webrev link: >> http://cr.openjdk.java.net/~luchsh/8236073.webrev/ >> >> We still have some questions. >> 1. Does the SoftMaxHeapSize limitation need to consider the GC time >> ratio as in >> expand_heap_after_young_collection? Now we haven't put the logic in yet. I am not completely clear what you are asking about, but the gc time ratio only affects the current "optimal" heap size which is bounded by SoftMaxHeapsize/max_capacity. >> 2. The concurrent uncommit/commit can only work for >> G1RegionsLargerThanCommitSizeMapper but not >> G1RegionsSmallerThanCommitSizeMapper which might need some locks to >> ensure the multi-thread >> synchronization issue( heap may expand immediately). I think bringing >> the lock synchronization >> may not be worthy for the little gain. Another idea is can we just not >> uncommit the pages of >> auxiliary data if in G1RegionsSmallerThanCommitSizeMapper? Heap >> regions should not be >> G1RegionsSmallerThanCommitSizeMapper most of time I guess... >> >> Looking forward to your advice:) Thanks, Thomas From erik.osterlund at oracle.com Tue Jan 14 09:33:31 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 14 Jan 2020 10:33:31 +0100 Subject: RFR: 8236153: ZGC: gc/z/TestUncommit.java fails with java.lang.Exception: Uncommitted too fast In-Reply-To: <24c41e59-3e2c-9284-489a-af487e6cccc0@oracle.com> References: <24c41e59-3e2c-9284-489a-af487e6cccc0@oracle.com> Message-ID: <53eea1e2-d57a-1697-b5e3-282f4bf0f04d@oracle.com> Hi Per, Looks good. /Erik On 1/13/20 4:12 PM, Per Liden wrote: > The test gc/z/TestUncommit.java fails now and then on Windows when > using -Xcomp. This test can fail if it's severely starved on CPU, as > it will cause the timing to be off. The logs confirms that the test > took an unusually long time to execute, suggesting it was starved on > CPU. This only happens in test tiers using -Xcomp, which is likely > causing the unusually high load. This patch disables this test when > using -Xcomp. I've enabled some GC logging, which should helpful if > this test ever fails again. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236153 > Webrev: http://cr.openjdk.java.net/~pliden/8236153/webrev.0 > > /Per From per.liden at oracle.com Tue Jan 14 09:41:35 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 14 Jan 2020 10:41:35 +0100 Subject: RFR: 8236153: ZGC: gc/z/TestUncommit.java fails with java.lang.Exception: Uncommitted too fast In-Reply-To: <53eea1e2-d57a-1697-b5e3-282f4bf0f04d@oracle.com> References: <24c41e59-3e2c-9284-489a-af487e6cccc0@oracle.com> <53eea1e2-d57a-1697-b5e3-282f4bf0f04d@oracle.com> Message-ID: Thanks Erik! /Per On 1/14/20 10:33 AM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > /Erik > > On 1/13/20 4:12 PM, Per Liden wrote: >> The test gc/z/TestUncommit.java fails now and then on Windows when >> using -Xcomp. This test can fail if it's severely starved on CPU, as >> it will cause the timing to be off. The logs confirms that the test >> took an unusually long time to execute, suggesting it was starved on >> CPU. This only happens in test tiers using -Xcomp, which is likely >> causing the unusually high load. This patch disables this test when >> using -Xcomp. I've enabled some GC logging, which should helpful if >> this test ever fails again. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8236153 >> Webrev: http://cr.openjdk.java.net/~pliden/8236153/webrev.0 >> >> /Per > From thomas.schatzl at oracle.com Tue Jan 14 11:36:22 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 14 Jan 2020 12:36:22 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> Message-ID: <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> Hi, On 14.01.20 10:07, Liang Mao wrote: > Hi Thomas, > > Thank you for the detailed comments! > Most of suggestions I will follow to do the modification. > And I still have some questions: > >>>?1.?Does?the?SoftMaxHeapSize?limitation?need?to?consider?the?GC?time >>>?ratio?as?in >>>?expand_heap_after_young_collection??Now?we?haven't?put?the?logic?in?yet. > >> I?am?not?completely?clear?what?you?are?asking?about,?but?the?gc?time >> ratio?only?affects?the?current?"optimal"?heap?size?which?is?bounded?by >> SoftMaxHeapsize/max_capacity. > > The decision to shrink to SoftMaxHeapSize in this patch is based > on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which > counts "used" + "reserve" + "young". We will change it to > _heap_sizing_policy->shrink_amount(); as you commented. > I'm not considering the GC time ratio as a factor to determine > whether the heap can be shrinked to SoftMaxHeapSize. Going back to the "spec": "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so. The soft max heap size should not be allowed to be set to a value smaller than min heap size (-Xms) or greater than max heap size (-Xmx). When not set on the command-line, this flag should default to the max heap size." (https://bugs.openjdk.java.net/browse/JDK-8222181) This is a very loose definition, and "unless the GC decides it's necessary" may mean anything. Looking at ZGC code, it mostly uses it to drive the GCs (and determine expansion amount), and let regular expansion/shrinking do the work without new rules. I would at first tend to do the same: if existing heap policy indicates that an expansion at young gc (which takes GCTimeRatio into account) is needed for whatever reason, I would actually let G1 keep doing it; conversely I would also take GCTimeRatio into account when trying to shrink to keep the policy symmetric. The current implementation probably preferentially shrinks towards SoftMaxHeapSize, correct? (this shrinking also seems to be limited to exactly SoftMaxHeapSize - why not below that if the collector thinks it would be okay?) Do you have measurements with/without GCTimeRatio in the shrinking? Can you describe, with over-time heap occupancy graphs that this does not work at all in your workloads? Measurements of committed heap over time of the current solution would be appreciated too. (I haven't had the time yet to set up some useful testing application that can be used to simulate phases of such a workload to show heap shrinking but I assume you have some data.) >> -?changes?in?G1CollectedHeap::resize_heap_if_necessary:?please?make?the >> method?to?always?use?the?soft?max?heap?size?as?I?do?not?understand?why >> you?would?not?do?that. > > Do you think we need to apply the logic "can_shrink_heap_size_to" > inside resize_heap_if_necessary to determine whether to make soft > max size as limit? Resize_heap_if_necessary() is the equivalent of adjust_heap_after_young_collection() for after full gc (the naming could be unified at some point). What the change implements right now is to respect SoftMaxHeapSize only on an explicit gc in resize_heap_if_necessary(), while always respecting it during young gcs. Do you have a reason for this decision? This seems like an inconsistency I can not find a reason for. As mentioned above, I would try to keep SoftMaxHeapSize only a goal for starting (concurrent) garbage collections, with "natural" sizing trying to keep the SoftMaxHeapSize goal. Particularly, if a (compacting) full gc won't meet the SoftMaxHeapSize, what else will? It is indeed unfortunate that you might need to tweak Min/MaxFreeRatio to achieve higher uncommit ratio at full gc... This change (system.gc specifically trying to meet SoftMaxHeapSize) also seems to be an artifact of your usage of this feature - maybe you happen to always issue a system.gc() after you changed SoftMaxHeapSize? It may probably be better if a concurrent cycle were triggered done automatically similar to periodic gcs. >>? -?there?is?a?bug?in?the?synchronization?of?the?concurrent?uncommit >> thread:?it?seems?possible?that?the?uncommit?thread?is?still?working >> (iterating?over?the?list?of?regions?to?uncommit)?while?a?completed >> (young)?GC?may?add?new?regions?to?that?list?as?young?gcs?do?not?wait?for >> completion?of?the?uncommit?thread. > > Uncommit thread could be working parallelly with VMThread but > VMThread will not add regions to the concurrent_resizing_list > if concurrent resizing thread is in "working" state. Okay, I now see the check in G1CollectedHeap::adjust_heap_after_young_collection(), but that also prohibits expansion during young GC which seems relatively disruptive. I think I mentioned earlier, that in this case (if we want to expand) it might be better to wait on completion of the parallel uncommit _if_ the other remaining regions are not enough. (Alternatively one could add another uncommit/commit request to a hypothetical uncommit task queue for that resize thread). My thinking is that at worst, this would result in the same behavior as before (i.e. blocking because of commit/uncommit in progress) instead of changing the overall behavior by denying expansion requests (silently, which is pretty bad). This could probably result in the heap sizing policy to get a few samples with high gc time ratio, expanding even more when G1 is finally allowed to expand. (Also, fortunately UseGCOverheadLimit is not implemented in G1, as that might kill the VM spuriously because of that...) Or do you have some reason for this particular implementation? (The reason I am asking so much about details is to get a common understanding on the solution, i.e. what should happen when, and hopefully including why for the next guy ;)) Maybe it would be good to refactor the code in G1CollectedHeap::adjust_heap_after_young_collection() (similar to or expand in G1CollectedHeap::adjust_heap_after_young_collection()), e.g. calculate some heap_size_change() value from the heap sizing policy, and then use that value depending on whether it is positive or negative to expand or shrink. > >> Related?to?that?is?the?use?of?par_set/clear_bit?in?e.g.?the?available >> bitmap:?since?all?par/clear?actions?are?asserted?to?be?in?the?vm?thread >> at?a?safepoint,?there?does?not?seem?to?be?a?need?for?using?the?parallel >> variants?of?set/clear?bit?(if?keeping?the?current?mechanism). > > For above reason that concurrent uncommit can run parallely with > VMThread, the bit set/clear invm thread at safepint have > to be parallel. If concurrent uncommit is working, both heap shrinking and expansion in the safepoint is disabled as far as I can tell. I.e. 2996 void G1CollectedHeap::adjust_heap_after_young_collection() { 2997 if (concurrent_heap_resize()->resize_thread()->during_cycle()) { 2998 // Do not adjust if concurrent resizing is in progress 2999 return; 3000 } 3001 3002 double start_time_ms = os::elapsedTime(); 3003 shrink_heap_after_young_collection(); 3004 phase_times()->recor...[..] 3005 // Shrinking might have started resizing 3006 if (!concurrent_heap_resize()->resize_thread()->during_cycle()) { 3007 expand_heap_after_young_collection(); 3008 } 3009 } This method seems to be the only caller to expand/shrink_heap_after_young_collection(). Another question I had but forgot is that the thread and in other places "resize" is used in the method and class names instead of e.g. "uncommit" or "shrink". Do you plan to add concurrent commit too? >>? -?I?have?a?feeling?that?if?the?concurrent?uncommit?thread?worked?on >> pages,?not?regions,?the?code?would?be?easier?to?understand.?It?would >> also?solve?the?issue?you?asked?about?with?the >> G1RegionsSmallerThanCommitSizeMapper.?You?may?still?need?to?pass?region >> numbers?anyway?for?logging,?but?otoh?the?logging?could?be?done >> asynchroniusly. > > I don't quite understand this part... For > the?G1RegionsSmallerThanCommitSizeMapper, > a page can be simultaneously requested to commit in VMThread to expand > heap and uncommit in concurrent thread to shrink heap. Looks like lowering > uncommit work to page level couldn't help this... I see the problem, but as above I think expansion (commit) and shrinking (uncommit) can't happen at the same time at the moment, and it actually might be premature to allow expansion and shrinking occur at the same time. Some more options are serializing these requests; one I described above (if a request is pending, wait for its completion) before starting a new one. This might be the default option for other reasons anyway. Another is certainly to simply not uncommit them concurrently then, maybe we can still uncommit them immediately? > For the features you listed below, > ???1)?moving?the?memory?uncommit?into?the?concurrent?phase > ???2)?uncommit?at?the?end?of?(almost)?every?GC > ???3)?SoftMaxHeapSize > > Since most of code is for the concurrent framework, > do you think 2) and 3) canbe together and implemented first? > (The uncommit will happen immediately) I think that would be fine. Thanks, Thomas From aph at redhat.com Tue Jan 14 14:22:30 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 14:22:30 +0000 Subject: Is CPU_MULTI_COPY_ATOMIC the correct test here? Message-ID: AArch64 is multi-copy atomic, but it has a relaxed memory model. I'm looking at the CPU_MULTI_COPY_ATOMIC in this code: template bool GenericTaskQueue::pop_global(volatile E& t) { Age oldAge = _age.get(); // Architectures with weak memory model require a barrier here // to guarantee that bottom is not older than age, // which is crucial for the correctness of the algorithm. #ifndef CPU_MULTI_COPY_ATOMIC OrderAccess::fence(); #endif uint localBot = Atomic::load_acquire(&_bottom); uint n_elems = size(localBot, oldAge.top()); if (n_elems == 0) { return false; } It seems to me that what we're asking here is not whether the CPU is multi-copy atomic, but whether it's TSO or not. I'd like to turn CPU_MULTI_COPY_ATOMIC off for AArch64, but I think that GenericTaskQueue::pop_global() will break if I do. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Tue Jan 14 14:37:18 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 14 Jan 2020 09:37:18 -0500 Subject: Is CPU_MULTI_COPY_ATOMIC the correct test here? In-Reply-To: References: Message-ID: <220ace02-5c1d-e184-9b98-3b586bc85bd4@redhat.com> Hi Andrew, Here is the discussion on this particular issue ... https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/034900.html -Zhengyu On 1/14/20 9:22 AM, Andrew Haley wrote: > AArch64 is multi-copy atomic, but it has a relaxed memory model. I'm looking > at the CPU_MULTI_COPY_ATOMIC in this code: > > template > bool GenericTaskQueue::pop_global(volatile E& t) { > Age oldAge = _age.get(); > // Architectures with weak memory model require a barrier here > // to guarantee that bottom is not older than age, > // which is crucial for the correctness of the algorithm. > #ifndef CPU_MULTI_COPY_ATOMIC > OrderAccess::fence(); > #endif > uint localBot = Atomic::load_acquire(&_bottom); > uint n_elems = size(localBot, oldAge.top()); > if (n_elems == 0) { > return false; > } > > It seems to me that what we're asking here is not whether the CPU is > multi-copy atomic, but whether it's TSO or not. I'd like to turn > CPU_MULTI_COPY_ATOMIC off for AArch64, but I think that > GenericTaskQueue::pop_global() will break if I do. > From aph at redhat.com Tue Jan 14 14:58:09 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 14:58:09 +0000 Subject: Is CPU_MULTI_COPY_ATOMIC the correct test here? In-Reply-To: <220ace02-5c1d-e184-9b98-3b586bc85bd4@redhat.com> References: <220ace02-5c1d-e184-9b98-3b586bc85bd4@redhat.com> Message-ID: <1193d905-5d44-8688-0cca-2c4bd71b9f3d@redhat.com> On 1/14/20 2:37 PM, Zhengyu Gu wrote: > > Here is the discussion on this particular issue ... > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/034900.html The code (and its comment) seems to be related to whether this is a relaxed-memory machine. Much of the discussion seems to be related to that, too. I can't see any discussion about multi-copy atomicity. It seems to me that we do want this fence on AArch64, but we should not define CPU_MULTI_COPY_ATOMIC. I can't see why the concepts are mixed up. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 14 15:00:42 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 15:00:42 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> Message-ID: <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> On 8/15/19 4:49 PM, Derek White wrote: > However, setting CPU_MULTI_COPY_ATOMIC for AArch64 would result in changing behavior (removing fence in taskqueue) that should be looked at and tested by the aarch64 folks, so if Andrew Haley agrees, I suggest deferring changing this AArch64 behavior to a separate issue. Well, yes. What i don't understand is what any of this has to do with multi-copy atomicity. The fence is needed for _age.get() on all machines with relaxed memory consitency, AFAICS. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 14 15:05:01 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 15:05:01 +0000 Subject: Is CPU_MULTI_COPY_ATOMIC the correct test here? In-Reply-To: <220ace02-5c1d-e184-9b98-3b586bc85bd4@redhat.com> References: <220ace02-5c1d-e184-9b98-3b586bc85bd4@redhat.com> Message-ID: On 1/14/20 2:37 PM, Zhengyu Gu wrote: > Here is the discussion on this particular issue ... > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/034900.html Ah, I found the answer in another thread: http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008853.html -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 14 15:05:43 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 15:05:43 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> Message-ID: On 1/14/20 3:00 PM, Andrew Haley wrote: > On 8/15/19 4:49 PM, Derek White wrote: >> However, setting CPU_MULTI_COPY_ATOMIC for AArch64 would result in changing behavior (removing fence in taskqueue) that should be looked at and tested by the aarch64 folks, so if Andrew Haley agrees, I suggest deferring changing this AArch64 behavior to a separate issue. > > Well, yes. What i don't understand is what any of this has to do with > multi-copy atomicity. The fence is needed for _age.get() on all machines > with relaxed memory consitency, AFAICS. Ah, I found the answer in another thread: http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008853.html -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 14 15:08:42 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 15:08:42 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> Message-ID: On 1/14/20 3:05 PM, Andrew Haley wrote: > On 1/14/20 3:00 PM, Andrew Haley wrote: >> On 8/15/19 4:49 PM, Derek White wrote: >>> However, setting CPU_MULTI_COPY_ATOMIC for AArch64 would result in changing behavior (removing fence in taskqueue) that should be looked at and tested by the aarch64 folks, so if Andrew Haley agrees, I suggest deferring changing this AArch64 behavior to a separate issue. >> >> Well, yes. What i don't understand is what any of this has to do with >> multi-copy atomicity. The fence is needed for _age.get() on all machines >> with relaxed memory consitency, AFAICS. > > Ah, I found the answer in another thread: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008853.html "No, the problem is not reordering. The problem is that _bottom, which is read after _age, might be older than _age because another processor didn't write it back yet. The fence (sync) makes the current thread wait until it has the new _bottom. "On Power, a write is not visible to all other threads simultaneously (no multipl-copy-atomicity)." So, my question: on a machine with relaxed memory, do we still need an acquire fence? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Tue Jan 14 15:52:57 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 14 Jan 2020 15:52:57 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> Message-ID: Hi Andrew, good catch. I think you're right. A multi-copy-atomic, but weak architecture (e.g. aarch64) needs an instruction which orders both volatile loads. (IA64 compilers use acquiring loads when accessing volatile fields, that's why IA64 is not affected by this problem.) Best regards, Martin > -----Original Message----- > From: Andrew Haley > Sent: Dienstag, 14. Januar 2020 16:09 > To: Derek White ; Doerr, Martin > ; David Holmes ; > hotspot-gc-dev at openjdk.java.net; Kim Barrett > Subject: Re: RFR(S): 8229422: Taskqueue: Outdated selection of weak > memory model platforms > > On 1/14/20 3:05 PM, Andrew Haley wrote: > > On 1/14/20 3:00 PM, Andrew Haley wrote: > >> On 8/15/19 4:49 PM, Derek White wrote: > >>> However, setting CPU_MULTI_COPY_ATOMIC for AArch64 would result > in changing behavior (removing fence in taskqueue) that should be looked at > and tested by the aarch64 folks, so if Andrew Haley agrees, I suggest > deferring changing this AArch64 behavior to a separate issue. > >> > >> Well, yes. What i don't understand is what any of this has to do with > >> multi-copy atomicity. The fence is needed for _age.get() on all machines > >> with relaxed memory consitency, AFAICS. > > > > Ah, I found the answer in another thread: > > > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013- > March/008853.html > > "No, the problem is not reordering. The problem is that _bottom, > which is read after _age, might be older than _age because another > processor didn't write it back yet. The fence (sync) makes the > current thread wait until it has the new _bottom. > > "On Power, a write is not visible to all other threads simultaneously > (no multipl-copy-atomicity)." > > So, my question: on a machine with relaxed memory, do we still need an > acquire > fence? > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 14 16:15:36 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 14 Jan 2020 16:15:36 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> Message-ID: <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> On 1/14/20 3:52 PM, Doerr, Martin wrote: > good catch. I think you're right. A multi-copy-atomic, but weak > architecture (e.g. aarch64) needs an instruction which orders both > volatile loads. Good, I thought so. Given that TSO machines define OrderAccess::acquire() as no more than a compiler barrier, I believe that we could do something like #ifdef CPU_MULTI_COPY_ATOMIC OrderAccess::acquire(); #else OrderAccess::fence(); #endif -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Tue Jan 14 16:19:32 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 14 Jan 2020 16:19:32 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> Message-ID: Excellent. I'd propose the same fix. I've added Thomas Schatzl. Maybe he can have a look, too. Best regards, Martin > -----Original Message----- > From: Andrew Haley > Sent: Dienstag, 14. Januar 2020 17:16 > To: Doerr, Martin ; Derek White > ; David Holmes ; > hotspot-gc-dev at openjdk.java.net; Kim Barrett > Subject: Re: RFR(S): 8229422: Taskqueue: Outdated selection of weak > memory model platforms > > On 1/14/20 3:52 PM, Doerr, Martin wrote: > > > good catch. I think you're right. A multi-copy-atomic, but weak > > architecture (e.g. aarch64) needs an instruction which orders both > > volatile loads. > > Good, I thought so. > > Given that TSO machines define OrderAccess::acquire() as no more than > a compiler barrier, I believe that we could do something like > > #ifdef CPU_MULTI_COPY_ATOMIC > OrderAccess::acquire(); > #else > OrderAccess::fence(); > #endif > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Tue Jan 14 17:19:33 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 14 Jan 2020 12:19:33 -0500 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> Message-ID: Submit test also passed. May I get a second review? Thanks, -Zhengyu On 1/13/20 9:06 AM, Roman Kennke wrote: > OK. > > Thanks, > Roman > > >> >> Please review this small change that uses atomic operations to update >> StringDedupTable's entries and entries_removed counter. >> >> This is *not* a correctness fix or performance enhancement, but for >> Shenandoah GC to move StringDedupTable cleanup task into concurrent >> phase, while holding StringDedupTable_lock. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html >> >> Test: >> ? hotspot_gc (fastdebug and release) on x86_64 Linux >> ? Submit test in progress. >> >> Thanks, >> >> -Zhengyu >> > From mikael.vidstedt at oracle.com Wed Jan 15 00:18:25 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 14 Jan 2020 16:18:25 -0800 Subject: 8237182(T): Update copyright header for shenandoah and epsilon files Message-ID: Please review this small change which adjusts the copyright headers for shenandoah and epsilon related files. JBS: https://bugs.openjdk.java.net/browse/JDK-8237182 webrev: http://cr.openjdk.java.net/~mikael/webrevs/8237182/webrev.00/open/webrev/ Description: Many/most of the shenandoah and epsilon related files are missing the following line in the copyright header: "DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.? This change simply adds that line to the relevant files. In src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp there was also a missing empty line before one of the paragraphs in the header. Cheers, Mikael From igor.ignatyev at oracle.com Wed Jan 15 00:24:05 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 14 Jan 2020 16:24:05 -0800 Subject: 8237182(T): Update copyright header for shenandoah and epsilon files In-Reply-To: References: Message-ID: Hi Mikael, LGTM -- Igor > On Jan 14, 2020, at 4:18 PM, Mikael Vidstedt wrote: > > > Please review this small change which adjusts the copyright headers for shenandoah and epsilon related files. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8237182 > webrev: http://cr.openjdk.java.net/~mikael/webrevs/8237182/webrev.00/open/webrev/ > > Description: > > Many/most of the shenandoah and epsilon related files are missing the following line in the copyright header: > > "DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.? > > This change simply adds that line to the relevant files. In src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp there was also a missing empty line before one of the paragraphs in the header. > > Cheers, > Mikael > From david.holmes at oracle.com Wed Jan 15 01:00:55 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 15 Jan 2020 11:00:55 +1000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> Message-ID: <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> On 15/01/2020 2:15 am, Andrew Haley wrote: > On 1/14/20 3:52 PM, Doerr, Martin wrote: > >> good catch. I think you're right. A multi-copy-atomic, but weak >> architecture (e.g. aarch64) needs an instruction which orders both >> volatile loads. > > Good, I thought so. > > Given that TSO machines define OrderAccess::acquire() as no more than > a compiler barrier, I believe that we could do something like > > #ifdef CPU_MULTI_COPY_ATOMIC > OrderAccess::acquire(); > #else > OrderAccess::fence(); > #endif "acquire" isn't used to order loads it is used to pair with a "release" associated with the store of the variable now being loaded. If this is the code referred to: Age oldAge = _age.get(); // Architectures with weak memory model require a barrier here // to guarantee that bottom is not older than age, // which is crucial for the correctness of the algorithm. #ifndef CPU_MULTI_COPY_ATOMIC OrderAccess::fence(); #endif uint localBot = Atomic::load_acquire(&_bottom); then I think there is an assumption (perhaps incorrect) that the load_acquire will prevent reordering as well as performing the necessary "acquire" semantics. If the load_acquire doesn't prevent reordering then surely a loadload() barrier is what is needed. David ----- From zgu at redhat.com Wed Jan 15 03:11:56 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 14 Jan 2020 22:11:56 -0500 Subject: 8237182(T): Update copyright header for shenandoah and epsilon files In-Reply-To: References: Message-ID: <3925a75d-f7f0-cd69-670f-f35422a17250@redhat.com> Looks good to me. Thanks for fixing it. -Zhengyu On 1/14/20 7:18 PM, Mikael Vidstedt wrote: > > Please review this small change which adjusts the copyright headers for shenandoah and epsilon related files. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8237182 > webrev: http://cr.openjdk.java.net/~mikael/webrevs/8237182/webrev.00/open/webrev/ > > Description: > > Many/most of the shenandoah and epsilon related files are missing the following line in the copyright header: > > "DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.? > > This change simply adds that line to the relevant files. In src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp there was also a missing empty line before one of the paragraphs in the header. > > Cheers, > Mikael > From mikael.vidstedt at oracle.com Wed Jan 15 03:37:01 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 14 Jan 2020 19:37:01 -0800 Subject: 8237182(T): Update copyright header for shenandoah and epsilon files In-Reply-To: <3925a75d-f7f0-cd69-670f-f35422a17250@redhat.com> References: <3925a75d-f7f0-cd69-670f-f35422a17250@redhat.com> Message-ID: <7D1F566C-838D-4DAA-B525-F1012AD78DFC@oracle.com> Igor & Zhengyu, Thanks for the reviews! Cheers, Mikael > On Jan 14, 2020, at 7:11 PM, Zhengyu Gu wrote: > > Looks good to me. > > Thanks for fixing it. > > -Zhengyu > > On 1/14/20 7:18 PM, Mikael Vidstedt wrote: >> Please review this small change which adjusts the copyright headers for shenandoah and epsilon related files. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8237182 >> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8237182/webrev.00/open/webrev/ >> Description: >> Many/most of the shenandoah and epsilon related files are missing the following line in the copyright header: >> "DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.? >> This change simply adds that line to the relevant files. In src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp there was also a missing empty line before one of the paragraphs in the header. >> Cheers, >> Mikael > From maoliang.ml at alibaba-inc.com Wed Jan 15 03:52:02 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 15 Jan 2020 11:52:02 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com>, <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> Message-ID: Hi Thomas, I summarize the issues in as following: 1. Criterion of SoftMaxHeapSize I agree to keep the policy of SoftMaxHeapSize similar with ZGC to make it unified. So "expand_heap_after_young_collection" is used for meeting the basic GCTimeRatio and expand heap immediately which cannot be blocked by any reasons. "adjust_heap_after_young_collection" cannot change the logic and I will take both expansion and shrink into consideration. Is my understanding correct here? 2. Full GC with SoftMaxHeapSize In my thought non-explicit Full GC probably means the insufficiency of heap capacity and we may not keep shrinking within SoftMaxHeapSize but explicit FGC don't have that issue. That's the only reason why I checked if it is explicit. But we will have the same determine logic to check if the heap can be shrinked so "explicit" check could be meaningless and I will remove that. 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms The constraint function didn't make sure the SoftMaxHeapSize should less than Xms. Do we need to add the checking? It will not only affect G1... 4. commit/uncommit parallelism The concurrent uncommit will work with VMThread doing GC and GC may request to expand heap if not enough empty regions. So the parallelism is possible and immediate uncommit is a solution. 4. More heap expansion/shrink heuristics further We have some data and experience in dynamimc heap adjustment in our workloads. The default GCTimeRatio 12 is really well tuned number that we found applications will have obvious timeout erros if it is less than ~12. So it is kind of *hard* limit and we need to expand immediately if GCTimeRatio drops below 12. The difference in our workloads is that we will keep a GCTimeRatio nearly the original value 99 to make GC in a heathy state because allocation rate and outside input can vary violently that we don't want frequent adjustment. You know that in our 8u implementation we just keep a conservative GC interval to achieve that. Comparing to the current code in JDK15, keeping GCTimeRatio as 99 is a different behavior which might have more memory footprint. I propose if we can still use the original option "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified number. The default flow will make sure the GCTimeRatio is above the threshold 12 and concurrent commit/uncommit will adjust the heap to keep GCTimeRatio in a proper number that the adjustment is not urgent. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 14 (Tue.) 19:36 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 14.01.20 10:07, Liang Mao wrote: > Hi Thomas, > > Thank you for the detailed comments! > Most of suggestions I will follow to do the modification. > And I still have some questions: > >>> 1. Does the SoftMaxHeapSize limitation need to consider the GC time >>> ratio as in >>> expand_heap_after_young_collection? Now we haven't put the logic in yet. > >> I am not completely clear what you are asking about, but the gc time >> ratio only affects the current "optimal" heap size which is bounded by >> SoftMaxHeapsize/max_capacity. > > The decision to shrink to SoftMaxHeapSize in this patch is based > on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which > counts "used" + "reserve" + "young". We will change it to > _heap_sizing_policy->shrink_amount(); as you commented. > I'm not considering the GC time ratio as a factor to determine > whether the heap can be shrinked to SoftMaxHeapSize. Going back to the "spec": "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so. The soft max heap size should not be allowed to be set to a value smaller than min heap size (-Xms) or greater than max heap size (-Xmx). When not set on the command-line, this flag should default to the max heap size." (https://bugs.openjdk.java.net/browse/JDK-8222181) This is a very loose definition, and "unless the GC decides it's necessary" may mean anything. Looking at ZGC code, it mostly uses it to drive the GCs (and determine expansion amount), and let regular expansion/shrinking do the work without new rules. I would at first tend to do the same: if existing heap policy indicates that an expansion at young gc (which takes GCTimeRatio into account) is needed for whatever reason, I would actually let G1 keep doing it; conversely I would also take GCTimeRatio into account when trying to shrink to keep the policy symmetric. The current implementation probably preferentially shrinks towards SoftMaxHeapSize, correct? (this shrinking also seems to be limited to exactly SoftMaxHeapSize - why not below that if the collector thinks it would be okay?) Do you have measurements with/without GCTimeRatio in the shrinking? Can you describe, with over-time heap occupancy graphs that this does not work at all in your workloads? Measurements of committed heap over time of the current solution would be appreciated too. (I haven't had the time yet to set up some useful testing application that can be used to simulate phases of such a workload to show heap shrinking but I assume you have some data.) >> - changes in G1CollectedHeap::resize_heap_if_necessary: please make the >> method to always use the soft max heap size as I do not understand why >> you would not do that. > > Do you think we need to apply the logic "can_shrink_heap_size_to" > inside resize_heap_if_necessary to determine whether to make soft > max size as limit? Resize_heap_if_necessary() is the equivalent of adjust_heap_after_young_collection() for after full gc (the naming could be unified at some point). What the change implements right now is to respect SoftMaxHeapSize only on an explicit gc in resize_heap_if_necessary(), while always respecting it during young gcs. Do you have a reason for this decision? This seems like an inconsistency I can not find a reason for. As mentioned above, I would try to keep SoftMaxHeapSize only a goal for starting (concurrent) garbage collections, with "natural" sizing trying to keep the SoftMaxHeapSize goal. Particularly, if a (compacting) full gc won't meet the SoftMaxHeapSize, what else will? It is indeed unfortunate that you might need to tweak Min/MaxFreeRatio to achieve higher uncommit ratio at full gc... This change (system.gc specifically trying to meet SoftMaxHeapSize) also seems to be an artifact of your usage of this feature - maybe you happen to always issue a system.gc() after you changed SoftMaxHeapSize? It may probably be better if a concurrent cycle were triggered done automatically similar to periodic gcs. >> - there is a bug in the synchronization of the concurrent uncommit >> thread: it seems possible that the uncommit thread is still working >> (iterating over the list of regions to uncommit) while a completed >> (young) GC may add new regions to that list as young gcs do not wait for >> completion of the uncommit thread. > > Uncommit thread could be working parallelly with VMThread but > VMThread will not add regions to the concurrent_resizing_list > if concurrent resizing thread is in "working" state. Okay, I now see the check in G1CollectedHeap::adjust_heap_after_young_collection(), but that also prohibits expansion during young GC which seems relatively disruptive. I think I mentioned earlier, that in this case (if we want to expand) it might be better to wait on completion of the parallel uncommit _if_ the other remaining regions are not enough. (Alternatively one could add another uncommit/commit request to a hypothetical uncommit task queue for that resize thread). My thinking is that at worst, this would result in the same behavior as before (i.e. blocking because of commit/uncommit in progress) instead of changing the overall behavior by denying expansion requests (silently, which is pretty bad). This could probably result in the heap sizing policy to get a few samples with high gc time ratio, expanding even more when G1 is finally allowed to expand. (Also, fortunately UseGCOverheadLimit is not implemented in G1, as that might kill the VM spuriously because of that...) Or do you have some reason for this particular implementation? (The reason I am asking so much about details is to get a common understanding on the solution, i.e. what should happen when, and hopefully including why for the next guy ;)) Maybe it would be good to refactor the code in G1CollectedHeap::adjust_heap_after_young_collection() (similar to or expand in G1CollectedHeap::adjust_heap_after_young_collection()), e.g. calculate some heap_size_change() value from the heap sizing policy, and then use that value depending on whether it is positive or negative to expand or shrink. > >> Related to that is the use of par_set/clear_bit in e.g. the available >> bitmap: since all par/clear actions are asserted to be in the vm thread >> at a safepoint, there does not seem to be a need for using the parallel >> variants of set/clear bit (if keeping the current mechanism). > > For above reason that concurrent uncommit can run parallely with > VMThread, the bit set/clear invm thread at safepint have > to be parallel. If concurrent uncommit is working, both heap shrinking and expansion in the safepoint is disabled as far as I can tell. I.e. 2996 void G1CollectedHeap::adjust_heap_after_young_collection() { 2997 if (concurrent_heap_resize()->resize_thread()->during_cycle()) { 2998 // Do not adjust if concurrent resizing is in progress 2999 return; 3000 } 3001 3002 double start_time_ms = os::elapsedTime(); 3003 shrink_heap_after_young_collection(); 3004 phase_times()->recor...[..] 3005 // Shrinking might have started resizing 3006 if (!concurrent_heap_resize()->resize_thread()->during_cycle()) { 3007 expand_heap_after_young_collection(); 3008 } 3009 } This method seems to be the only caller to expand/shrink_heap_after_young_collection(). Another question I had but forgot is that the thread and in other places "resize" is used in the method and class names instead of e.g. "uncommit" or "shrink". Do you plan to add concurrent commit too? >> - I have a feeling that if the concurrent uncommit thread worked on >> pages, not regions, the code would be easier to understand. It would >> also solve the issue you asked about with the >> G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region >> numbers anyway for logging, but otoh the logging could be done >> asynchroniusly. > > I don't quite understand this part... For > the G1RegionsSmallerThanCommitSizeMapper, > a page can be simultaneously requested to commit in VMThread to expand > heap and uncommit in concurrent thread to shrink heap. Looks like lowering > uncommit work to page level couldn't help this... I see the problem, but as above I think expansion (commit) and shrinking (uncommit) can't happen at the same time at the moment, and it actually might be premature to allow expansion and shrinking occur at the same time. Some more options are serializing these requests; one I described above (if a request is pending, wait for its completion) before starting a new one. This might be the default option for other reasons anyway. Another is certainly to simply not uncommit them concurrently then, maybe we can still uncommit them immediately? > For the features you listed below, > 1) moving the memory uncommit into the concurrent phase > 2) uncommit at the end of (almost) every GC > 3) SoftMaxHeapSize > > Since most of code is for the concurrent framework, > do you think 2) and 3) canbe together and implemented first? > (The uncommit will happen immediately) I think that would be fine. Thanks, Thomas From felix.yang at huawei.com Wed Jan 15 06:58:32 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 15 Jan 2020 06:58:32 +0000 Subject: [RFC] ZGC proposal for aarch64 jdk11u Message-ID: Hi, Currently, we only have zgc for the jdk11 x86 platform. Zgc for aarch64 platform was later added in jdk13 by Stuart from Linaro. So it?s an interesting question whether zgc for aarch64 platform should be backported to jdk11. Dozens of our arm-based cloud customers are switching to jdk11 and some of them have a strong demand for the zgc feature for their business. To satisfy that requirement, we took the action to backport this feature in our jdk11 release. But we think this work should not be kept private. Other jdk11 vendors may come to the same problem. It?s appreciated if this work could be incorporated in the upstream jdk11 repo and further improved. We have backported the following zgc related patches to jdk11u: https://bugs.openjdk.java.net/browse/JDK-8217745 https://bugs.openjdk.java.net/browse/JDK-8224187 https://bugs.openjdk.java.net/browse/JDK-8214527 https://bugs.openjdk.java.net/browse/JDK-8224675 Basic test such as jtreg, jcstress looks good. Specjbb2015 test with zgc gives us anticipated max & critical score. I can provide more details and propose a webrev for the backport. But before that I would like to hear your comments & suggestions. Thanks, Felix From thomas.schatzl at oracle.com Wed Jan 15 08:37:21 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 15 Jan 2020 09:37:21 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> ,<36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> Message-ID: <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> Hi, On Wed, 2020-01-15 at 11:52 +0800, Liang Mao wrote: > Hi Thomas, > > I summarize the issues in as following: > > 1. Criterion of SoftMaxHeapSize > I agree to keep the policy of SoftMaxHeapSize similar with ZGC to > make it unified. So "expand_heap_after_young_collection" is used for > meeting the basic GCTimeRatio and expand heap immediately which > cannot be blocked by any > reasons. "adjust_heap_after_young_collection" cannot change the > logic > and I will take both expansion and shrink into consideration. Is my > understanding correct here? Yes, ideally we would be close to ZGC in behavior with SoftMaxHeapSize. If for some reason this does not work we may need to reconsider - but we need a reason if possible backed by numbers/graphs of actual behavior. > > 2. Full GC with SoftMaxHeapSize > In my thought non-explicit Full GC probably means the insufficiency > of heap capacity and we may not keep shrinking within SoftMaxHeapSize > but explicit FGC don't have that issue. That's the only reason why I People run explicit FGC for many reasons, and the one you describe is just one of them. E.g. explicit FGC can be converted to a concurrent cycle or disabled for other reasons, so having special behavior for this particular case may just not work as intended in many cases. Users may then need to decide then whether they want this behavor, or the system.gc-starts- concurrent-cycle one they might also rely on. The lone "System.gc()" call is insufficient to transport the actual intent of the user - but that is a different issue. > checked if it is explicit. But we will have the same determine logic > to check if the heap can be shrinked so "explicit" check could be > meaningless and I will remove that. Exactly. > > 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms > The constraint function didn't make sure the SoftMaxHeapSize should > less than Xms. Do we need to add the checking? It will not only > affect G1... I will check again later, but from what I remember from yesterday it does check it at VM start (-Xms sets both minimum and initial heap size). The constraint func does not check when the user changes the value during runtime. So code using it must still maintain this invariant in behavior. > 4. commit/uncommit parallelism > The concurrent uncommit will work with VMThread doing GC and GC may > request to expand heap if not enough empty regions. So the > parallelism is possible and immediate uncommit is a solution. There may be others, but it actually seems easiest as blocking such a request seems actually harder to implement, at least it's less localized in the code. Completely *dropping* the request seems against the rule that "SoftMaxHeapSize is a hint" guideline and may have other unforeseen consequences too. Like I said, since G1 does not expand then, there will be more GCs with the small heap, increasing the current GCTimeRatio more than it should. Which means when ultimately the request comes through as G1 will certainly try again, the increase may be huge. (The increase is proportional to the difference in actual and requested GCTimeRatio iirc). Again, if there are good reasons to do otherwise I am open to discussion, but it would be nice to have numbers to base decisions on. > 4. More heap expansion/shrink heuristics further > We have some data and experience in dynamimc heap adjustment in our > workloads. > The default GCTimeRatio 12 is really well tuned number that we found > applications will have obvious timeout erros if it is less than ~12. It is actually *very* interesting to hear that the default G1 GCTimeRatio fits you well. Given over-time improvements in G1 gc performance, I was already privately asking myselves whether to decrease the default percentage, increasing this value (I hope I got the directions right ;)) and similarly adjust the default MaxGCPauseMillis down to reflect that from time to time. > So it is kind of *hard* limit and we need to expand immediately if > GCTimeRatio drops below 12. The difference in our workloads is that > we will keep a GCTimeRatio nearly the original value 99 to make GC in I.e. you set it to 99 at startup? > a heathy state because allocation rate and outside input can vary > violently that we don't want frequent adjustment. You know that in > our 8u implementation we just keep a conservative GC interval to > achieve that. Comparing to the current code in JDK15, keeping > GCTimeRatio as 99 is a different behavior which might have more > memory footprint. As mentioned above, I think given that we both very thinking about this, we might actually evaluate changing the defaults. > I propose if we can still use the original option > "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified > number. The default flow will make sure the GCTimeRatio is above the > threshold 12 and concurrent commit/uncommit will adjust the heap to > keep GCTimeRatio in a proper number that the adjustment is not > urgent. I am not completely sure what you want to achieve here or what the problem is. I probably need to understand more about the problem and potentially other solutions can be found. As for a new -XX:+G1ElasticHeap option, it does not seem to make a difference to set this or -XX:GCTimeRatio in this case (both are single options). But I do not completely know the details here. Thanks, Thomas From maoliang.ml at alibaba-inc.com Wed Jan 15 10:58:40 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 15 Jan 2020 18:58:40 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> , <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> , <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> Message-ID: <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com> Hi Thomas, >> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms >> The constraint function didn't make sure the SoftMaxHeapSize should >> less than Xms. Do we need to add the checking? It will not only >> affect G1... > I will check again later, but from what I remember from yesterday it > does check it at VM start (-Xms sets both minimum and initial heap > size). The constraint func does not check when the user changes the > value during runtime. So code using it must still maintain this > invariant in behavior. The default constraint function will be both checked in VM startup and during runtime via jinfo. By looking into the code, ZGC seems to allow SoftMaxHeapSize less than Xms. So do we need to create another mail thread to discuss it? >> 4. commit/uncommit parallelism >> The concurrent uncommit will work with VMThread doing GC and GC may >> request to expand heap if not enough empty regions. So the >> parallelism is possible and immediate uncommit is a solution. > There may be others, but it actually seems easiest as blocking such a > request seems actually harder to implement, at least it's less > localized in the code. Completely *dropping* the request seems against > the rule that "SoftMaxHeapSize is a hint" guideline and may have other > unforeseen consequences too. Like I said, since G1 does not expand > then, there will be more GCs with the small heap, increasing the > current GCTimeRatio more than it should. Which means when ultimately > the request comes through as G1 will certainly try again, the increase > may be huge. (The increase is proportional to the difference in actual > and requested GCTimeRatio iirc). > Again, if there are good reasons to do otherwise I am open to > discussion, but it would be nice to have numbers to base decisions on. I'm not on the side of blocking the expand request:) G1RegionsLargerThanCommitSizeMapper can do uncommit/commit parallelly and G1RegionsSmallerThanCommitSizeMapper can do uncommit/commit immediately. So I think we don't have issues so far? >> So it is kind of *hard* limit and we need to expand immediately if >> GCTimeRatio drops below 12. The difference in our workloads is that >> we will keep a GCTimeRatio nearly the original value 99 to make GC in >I.e. you set it to 99 at startup? In fact we are not controlling GCTimeRatio. In a lot of applications running in exclusive containers we set Xms same to Xmx to avoid any heap expansion during runtime which might cause allocation stalls and timeout. >> I propose if we can still use the original option >> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified >> number. The default flow will make sure the GCTimeRatio is above the >> threshold 12 and concurrent commit/uncommit will adjust the heap to >> keep GCTimeRatio in a proper number that the adjustment is not >> urgent. > I am not completely sure what you want to achieve here or what the > problem is. I probably need to understand more about the problem and > potentially other solutions can be found. > As for a new -XX:+G1ElasticHeap option, it does not seem to make a > difference to set this or -XX:GCTimeRatio in this case (both are single > options). But I do not completely know the details here. Theoretically Java heap will not return memory in default and ZGC/Shenandoah have options to control by "ZUncommit" and "ShenandoahUncommit" to info user that memory can be uncommit... So I think G1 needs the same thing as well. In my opinion, here are 2 espects. The default value of GCTimeRatio is the basic line so we might need to expand immediately to avoid frequent GCs if using concurrent flow. But the G1ElasticHeap is an optimization to keep the balance of GC health and memory utility so the policy should be more conservative and we also need to do it concurrently by not bringing any obvious pause overhead. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 15 (Wed.) 16:37 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On Wed, 2020-01-15 at 11:52 +0800, Liang Mao wrote: > Hi Thomas, > > I summarize the issues in as following: > > 1. Criterion of SoftMaxHeapSize > I agree to keep the policy of SoftMaxHeapSize similar with ZGC to > make it unified. So "expand_heap_after_young_collection" is used for > meeting the basic GCTimeRatio and expand heap immediately which > cannot be blocked by any > reasons. "adjust_heap_after_young_collection" cannot change the > logic > and I will take both expansion and shrink into consideration. Is my > understanding correct here? Yes, ideally we would be close to ZGC in behavior with SoftMaxHeapSize. If for some reason this does not work we may need to reconsider - but we need a reason if possible backed by numbers/graphs of actual behavior. > > 2. Full GC with SoftMaxHeapSize > In my thought non-explicit Full GC probably means the insufficiency > of heap capacity and we may not keep shrinking within SoftMaxHeapSize > but explicit FGC don't have that issue. That's the only reason why I People run explicit FGC for many reasons, and the one you describe is just one of them. E.g. explicit FGC can be converted to a concurrent cycle or disabled for other reasons, so having special behavior for this particular case may just not work as intended in many cases. Users may then need to decide then whether they want this behavor, or the system.gc-starts- concurrent-cycle one they might also rely on. The lone "System.gc()" call is insufficient to transport the actual intent of the user - but that is a different issue. > checked if it is explicit. But we will have the same determine logic > to check if the heap can be shrinked so "explicit" check could be > meaningless and I will remove that. Exactly. > > 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms > The constraint function didn't make sure the SoftMaxHeapSize should > less than Xms. Do we need to add the checking? It will not only > affect G1... I will check again later, but from what I remember from yesterday it does check it at VM start (-Xms sets both minimum and initial heap size). The constraint func does not check when the user changes the value during runtime. So code using it must still maintain this invariant in behavior. > 4. commit/uncommit parallelism > The concurrent uncommit will work with VMThread doing GC and GC may > request to expand heap if not enough empty regions. So the > parallelism is possible and immediate uncommit is a solution. There may be others, but it actually seems easiest as blocking such a request seems actually harder to implement, at least it's less localized in the code. Completely *dropping* the request seems against the rule that "SoftMaxHeapSize is a hint" guideline and may have other unforeseen consequences too. Like I said, since G1 does not expand then, there will be more GCs with the small heap, increasing the current GCTimeRatio more than it should. Which means when ultimately the request comes through as G1 will certainly try again, the increase may be huge. (The increase is proportional to the difference in actual and requested GCTimeRatio iirc). Again, if there are good reasons to do otherwise I am open to discussion, but it would be nice to have numbers to base decisions on. > 4. More heap expansion/shrink heuristics further > We have some data and experience in dynamimc heap adjustment in our > workloads. > The default GCTimeRatio 12 is really well tuned number that we found > applications will have obvious timeout erros if it is less than ~12. It is actually *very* interesting to hear that the default G1 GCTimeRatio fits you well. Given over-time improvements in G1 gc performance, I was already privately asking myselves whether to decrease the default percentage, increasing this value (I hope I got the directions right ;)) and similarly adjust the default MaxGCPauseMillis down to reflect that from time to time. > So it is kind of *hard* limit and we need to expand immediately if > GCTimeRatio drops below 12. The difference in our workloads is that > we will keep a GCTimeRatio nearly the original value 99 to make GC in I.e. you set it to 99 at startup? > a heathy state because allocation rate and outside input can vary > violently that we don't want frequent adjustment. You know that in > our 8u implementation we just keep a conservative GC interval to > achieve that. Comparing to the current code in JDK15, keeping > GCTimeRatio as 99 is a different behavior which might have more > memory footprint. As mentioned above, I think given that we both very thinking about this, we might actually evaluate changing the defaults. > I propose if we can still use the original option > "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified > number. The default flow will make sure the GCTimeRatio is above the > threshold 12 and concurrent commit/uncommit will adjust the heap to > keep GCTimeRatio in a proper number that the adjustment is not > urgent. I am not completely sure what you want to achieve here or what the problem is. I probably need to understand more about the problem and potentially other solutions can be found. As for a new -XX:+G1ElasticHeap option, it does not seem to make a difference to set this or -XX:GCTimeRatio in this case (both are single options). But I do not completely know the details here. Thanks, Thomas From thomas.schatzl at oracle.com Wed Jan 15 11:44:18 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 15 Jan 2020 12:44:18 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com> Message-ID: <9d21b384-790b-c2e2-d801-0025c9257656@oracle.com> Hi, On 15.01.20 11:58, Liang Mao wrote: > Hi Thomas, > >>>?3.?SoftMaxHeapSizeConstraintFunc?doesn't?check?Xms >>>?The?constraint?function?didn't?make?sure?the?SoftMaxHeapSize?should >>> less?than?Xms.?Do?we?need?to?add?the?checking??It?will?not?only >>> affect?G1... > >> I?will?check?again?later,?but?from?what?I?remember?from?yesterday?it >> does?check?it?at?VM?start?(-Xms?sets?both?minimum?and?initial?heap >> size).?The?constraint?func?does?not?check?when?the?user?changes?the >> value?during?runtime.?So?code?using?it?must?still?maintain?this >> invariant?in?behavior. > > The default constraint function will be both checked in VM startup > and during runtime via jinfo. By looking into the code, ZGC seems > to allow SoftMaxHeapSize less than Xms. So do we need to create > another mail thread to discuss it? Colleagues mentioned that ZGC allows setting SoftMaxHeapSize below MinheapSize, but does not uncommit memory below it. I do not see a problem for allowing the user set SoftMaxHeapSize below MinHeapSize so it may have limited use. If jinfo prevents this too, then it seems that the code can assume that SoftMaxHeapSize is within Min/MaxHeapSize. > >>>?4.?commit/uncommit?parallelism >>>?The?concurrent?uncommit?will?work?with?VMThread?doing?GC?and?GC?may >>>?request?to?expand?heap?if?not?enough?empty?regions.?So?the >>>?parallelism?is?possible?and?immediate?uncommit?is?a?solution. > >> There?may?be?others,?but?it?actually?seems?easiest?as?blocking?such?a >> request?seems?actually?harder?to?implement,?at?least?it's?less >> localized?in?the?code.?Completely?*dropping*?the?request?seems?against >> the?rule?that?"SoftMaxHeapSize?is?a?hint"?guideline?and?may?have?other >> unforeseen?consequences?too.?Like?I?said,?since?G1?does?not?expand >> then,?there?will?be?more?GCs?with?the?small?heap,?increasing?the >> current?GCTimeRatio?more?than?it?should.?Which?means?when?ultimately >> the?request?comes?through?as?G1?will?certainly?try?again,?the?increase >> may?be?huge.?(The?increase?is?proportional?to?the?difference?in?actual >> and?requested?GCTimeRatio?iirc). > >> Again,?if?there?are?good?reasons?to?do?otherwise?I?am?open?to >> discussion,?but?it?would?be?nice?to?have?numbers?to?base?decisions?on. > > I'm not on the side of blocking the expand request:) > G1RegionsLargerThanCommitSizeMapper can do uncommit/commit > parallelly and G1RegionsSmallerThanCommitSizeMapper > can do uncommit/commit immediately. So I think we don't have issues > so far? :) > >>>?So?it?is?kind?of?*hard*?limit?and?we?need?to?expand?immediately?if >>>?GCTimeRatio?drops?below?12.?The?difference?in?our?workloads?is?that >>>?we?will?keep?a?GCTimeRatio?nearly?the?original?value?99?to?make?GC?in > >>I.e.?you?set?it?to?99?at?startup? > > In fact we are not controlling GCTimeRatio. In a lot of applications > running in exclusive containers we set Xms same to Xmx to avoid > any heap expansion during runtime which might cause allocation > stalls and timeout. Okay. > >>>?I?propose?if?we?can?still?use?the?original?option >>>?"-XX:+G1ElasticHeap"?to?keep?the?GCTimeRatio?around?99?or?a?specified >>>?number.?The?default?flow?will?make?sure?the?GCTimeRatio?is?above?the >>>?threshold?12?and?concurrent?commit/uncommit?will?adjust?the?heap?to >>>?keep?GCTimeRatio?in?a?proper?number?that?the?adjustment?is?not >>>?urgent. > >> I?am?not?completely?sure?what?you?want?to?achieve?here?or?what?the >> problem?is.?I?probably?need?to?understand?more?about?the?problem?and >> potentially?other?solutions?can?be?found. > >> As?for?a?new?-XX:+G1ElasticHeap?option,?it?does?not?seem?to?make?a >> difference?to?set?this?or?-XX:GCTimeRatio?in?this?case?(both?are?single >> options).?But?I?do?not?completely?know?the?details?here. > > Theoretically Java heap will not return memory in default and > ZGC/Shenandoah have options to control by "ZUncommit" and > "ShenandoahUncommit" > to info user that memory can be uncommit... So I think G1 needs > the same thing as well. In my opinion, here are 2 espects. The G1 uncommits unused memory by default since a long time ago. There is no flag to disable this behavior except setting -Xms == -Xmx. The policies when are also different (using Min/MaxHeapFreeRatio) compared to other collectors. However only lately (JDK12 or 13) it does so at the end of the Remark pause - earlier it only did so after full gc. The changes provided also enable shrinking of the heap during most young GCs. It may be a problem that full gcs (including "concurrent full gc") and young gcs use a different policy btw as occurred to me yesterday after sending the email. That's something to explore. > default value of?GCTimeRatio?is the basic line so we might > need to expand immediately to avoid frequent GCs if using > concurrent flow. But the G1ElasticHeap is an optimization > to keep the balance of GC health and memory utility so the > policy should?be more conservative and we also need to do it > concurrently by not bringing any obvious pause overhead. > Changing GCTimeRatio to a higher value should improve the response time on memory needs. The changes provided by you are also going to fix the concurrent (un-)commit. Thanks, Thomas From per.liden at oracle.com Wed Jan 15 12:07:42 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 15 Jan 2020 13:07:42 +0100 Subject: [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: References: Message-ID: <38a15dc5-9cee-0f44-13ee-98f185ee72ae@oracle.com> Hi, Please note that backporting JDK-8224675 "Late GC barrier insertion for ZGC" is not great idea, since that patch introduced stability issues and the whole approach was later superseded by JDK-8230565 "ZGC: Redesign C2 load barrier to expand on the MachNode level". If you want to go down this path, I'd suggest that you either don't backport JDK-8224675 at all, or backport everything up to JDK-8224675 + JDK-8230565. Also note that if you include JDK-8230565 you want to be careful to also include any followup bug fixes, like JDK-8233506. In general, a lot of stability and performance improvements have gone into ZGC since JDK 11. If at all possible, I would strongly recommend using JDK 14 instead, where you already have aarch64 support and all other goodies. cheers, Per On 1/15/20 7:58 AM, Yangfei (Felix) wrote: > Hi, > > Currently, we only have zgc for the jdk11 x86 platform. Zgc for aarch64 platform was later added in jdk13 by Stuart from Linaro. > So it?s an interesting question whether zgc for aarch64 platform should be backported to jdk11. > > Dozens of our arm-based cloud customers are switching to jdk11 and some of them have a strong demand for the zgc feature for their business. > To satisfy that requirement, we took the action to backport this feature in our jdk11 release. > But we think this work should not be kept private. Other jdk11 vendors may come to the same problem. > It?s appreciated if this work could be incorporated in the upstream jdk11 repo and further improved. > > We have backported the following zgc related patches to jdk11u: > https://bugs.openjdk.java.net/browse/JDK-8217745 > https://bugs.openjdk.java.net/browse/JDK-8224187 > https://bugs.openjdk.java.net/browse/JDK-8214527 > https://bugs.openjdk.java.net/browse/JDK-8224675 > > Basic test such as jtreg, jcstress looks good. Specjbb2015 test with zgc gives us anticipated max & critical score. > I can provide more details and propose a webrev for the backport. But before that I would like to hear your comments & suggestions. > > Thanks, > Felix > From maoliang.ml at alibaba-inc.com Wed Jan 15 12:53:20 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 15 Jan 2020 20:53:20 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <9d21b384-790b-c2e2-d801-0025c9257656@oracle.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com>, <9d21b384-790b-c2e2-d801-0025c9257656@oracle.com> Message-ID: <693b04b7-d13d-4ef5-b425-febc81984dbc.maoliang.ml@alibaba-inc.com> Hi Thomas, So G1 doesn't need to shrink below Xms if SoftMaxHeapSize is below Xms, does it? Another question is that no matter we have an additional option we had better have 2 criterions. The first is for urgent expansion that GCTimeRatio is quite low and concurrent expansion with frequent GCs is more harmful and expansion should be done immediately. It's the current default flow as we found that 12 is a good number below which applications can obviously incur timeout errors. The second is to keep the GCTimeRatio and memory footprint in a balanced state so any adjustments are better to be concurrent. The original number 99 fits well here. If we have only one option "GCTimeRatio", we might not be able to achieve both. Maybe we can have a LowGCTimeRatio below which suppose to be not acceptable and a HighTimeRatio which is certainly healthy. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 15 (Wed.) 19:44 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 15.01.20 11:58, Liang Mao wrote: > Hi Thomas, > >>> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms >>> The constraint function didn't make sure the SoftMaxHeapSize should >>> less than Xms. Do we need to add the checking? It will not only >>> affect G1... > >> I will check again later, but from what I remember from yesterday it >> does check it at VM start (-Xms sets both minimum and initial heap >> size). The constraint func does not check when the user changes the >> value during runtime. So code using it must still maintain this >> invariant in behavior. > > The default constraint function will be both checked in VM startup > and during runtime via jinfo. By looking into the code, ZGC seems > to allow SoftMaxHeapSize less than Xms. So do we need to create > another mail thread to discuss it? Colleagues mentioned that ZGC allows setting SoftMaxHeapSize below MinheapSize, but does not uncommit memory below it. I do not see a problem for allowing the user set SoftMaxHeapSize below MinHeapSize so it may have limited use. If jinfo prevents this too, then it seems that the code can assume that SoftMaxHeapSize is within Min/MaxHeapSize. > >>> 4. commit/uncommit parallelism >>> The concurrent uncommit will work with VMThread doing GC and GC may >>> request to expand heap if not enough empty regions. So the >>> parallelism is possible and immediate uncommit is a solution. > >> There may be others, but it actually seems easiest as blocking such a >> request seems actually harder to implement, at least it's less >> localized in the code. Completely *dropping* the request seems against >> the rule that "SoftMaxHeapSize is a hint" guideline and may have other >> unforeseen consequences too. Like I said, since G1 does not expand >> then, there will be more GCs with the small heap, increasing the >> current GCTimeRatio more than it should. Which means when ultimately >> the request comes through as G1 will certainly try again, the increase >> may be huge. (The increase is proportional to the difference in actual >> and requested GCTimeRatio iirc). > >> Again, if there are good reasons to do otherwise I am open to >> discussion, but it would be nice to have numbers to base decisions on. > > I'm not on the side of blocking the expand request:) > G1RegionsLargerThanCommitSizeMapper can do uncommit/commit > parallelly and G1RegionsSmallerThanCommitSizeMapper > can do uncommit/commit immediately. So I think we don't have issues > so far? :) > >>> So it is kind of *hard* limit and we need to expand immediately if >>> GCTimeRatio drops below 12. The difference in our workloads is that >>> we will keep a GCTimeRatio nearly the original value 99 to make GC in > >>I.e. you set it to 99 at startup? > > In fact we are not controlling GCTimeRatio. In a lot of applications > running in exclusive containers we set Xms same to Xmx to avoid > any heap expansion during runtime which might cause allocation > stalls and timeout. Okay. > >>> I propose if we can still use the original option >>> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified >>> number. The default flow will make sure the GCTimeRatio is above the >>> threshold 12 and concurrent commit/uncommit will adjust the heap to >>> keep GCTimeRatio in a proper number that the adjustment is not >>> urgent. > >> I am not completely sure what you want to achieve here or what the >> problem is. I probably need to understand more about the problem and >> potentially other solutions can be found. > >> As for a new -XX:+G1ElasticHeap option, it does not seem to make a >> difference to set this or -XX:GCTimeRatio in this case (both are single >> options). But I do not completely know the details here. > > Theoretically Java heap will not return memory in default and > ZGC/Shenandoah have options to control by "ZUncommit" and > "ShenandoahUncommit" > to info user that memory can be uncommit... So I think G1 needs > the same thing as well. In my opinion, here are 2 espects. The G1 uncommits unused memory by default since a long time ago. There is no flag to disable this behavior except setting -Xms == -Xmx. The policies when are also different (using Min/MaxHeapFreeRatio) compared to other collectors. However only lately (JDK12 or 13) it does so at the end of the Remark pause - earlier it only did so after full gc. The changes provided also enable shrinking of the heap during most young GCs. It may be a problem that full gcs (including "concurrent full gc") and young gcs use a different policy btw as occurred to me yesterday after sending the email. That's something to explore. > default value of GCTimeRatio is the basic line so we might > need to expand immediately to avoid frequent GCs if using > concurrent flow. But the G1ElasticHeap is an optimization > to keep the balance of GC health and memory utility so the > policy should be more conservative and we also need to do it > concurrently by not bringing any obvious pause overhead. > Changing GCTimeRatio to a higher value should improve the response time on memory needs. The changes provided by you are also going to fix the concurrent (un-)commit. Thanks, Thomas From per.liden at oracle.com Wed Jan 15 12:57:06 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 15 Jan 2020 13:57:06 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms Message-ID: Hi, Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, which aims to de-duplicate some of the multi-mapping code. I've split the change into three separate patches, the main patch followed by two patches doing some renaming. 1) The ZBackingFile code was designed to allow platforms to decide if they want to use heap multi-mapping or some other (possibly HW supported) scheme. As of today, all our supported platforms do heap multi-mapping, so there's some degree of code duplication in ZBackingFile for each platform. This patch moves common multi-mapping code into ZPhysicalMemoryManager. If we in the future find that we want to support a platform that doesn't do multi-mapping, then we can introduce an abstraction for this again. Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is somewhat misleading on platforms other than Linux. Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light of JDK-8237199. Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 cheers, Per From per.liden at oracle.com Wed Jan 15 13:03:39 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 15 Jan 2020 14:03:39 +0100 Subject: RFR: 8237201: ZGC: Remove unused ZRelocationSetSelector::fragmentation() Message-ID: <35727b46-b4d8-8336-b484-1119bff15468@oracle.com> ZRelocationSetSelector::fragmentation() is not used and can be removed. Bug: https://bugs.openjdk.java.net/browse/JDK-8237201 Webrev: http://cr.openjdk.java.net/~pliden/8237201/webrev.0 /Per From stuart.monteith at linaro.org Wed Jan 15 13:10:57 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 15 Jan 2020 13:10:57 +0000 Subject: [aarch64-port-dev ] [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: References: Message-ID: Hello Felix, I'm pleased that there is interest in ZGC on aarch64, that the performance is at expected levels and is apparently trouble free. However, I'd like to understand why this backporting is being done. If it is for running in production, then I'd expect Per, etc, to not be upset or disagree when I that ZGC on aarch64 in JDK 13 isn't production ready. I understand it would be easier to test with an existing software stack on top of JDK 11 rather than moving onto JDK 13, etc. However, ZGC is an experimental VM feature, and the model OpenJDK has moved to is for 6 monthly releases. Per and his team have made lots of improvements, and fixes, in ZGC since 11, so I would expect people to run and test on current release to avoid hitting already known issues, whether that be on x86 or aarch64. There might be a possibility of backporting some features, if there is no effect when they are disabled. However, it is probably better to track ZGC development through the released versions as a single backport will not be enough, I wouldn't want to burden Per's team with maintaining something that is in as much development as ZGC at this point. BR, Stuart On Wed, 15 Jan 2020 at 06:59, Yangfei (Felix) wrote: > > Hi, > > Currently, we only have zgc for the jdk11 x86 platform. Zgc for aarch64 platform was later added in jdk13 by Stuart from Linaro. > So it?s an interesting question whether zgc for aarch64 platform should be backported to jdk11. > > Dozens of our arm-based cloud customers are switching to jdk11 and some of them have a strong demand for the zgc feature for their business. > To satisfy that requirement, we took the action to backport this feature in our jdk11 release. > But we think this work should not be kept private. Other jdk11 vendors may come to the same problem. > It?s appreciated if this work could be incorporated in the upstream jdk11 repo and further improved. > > We have backported the following zgc related patches to jdk11u: > https://bugs.openjdk.java.net/browse/JDK-8217745 > https://bugs.openjdk.java.net/browse/JDK-8224187 > https://bugs.openjdk.java.net/browse/JDK-8214527 > https://bugs.openjdk.java.net/browse/JDK-8224675 > > Basic test such as jtreg, jcstress looks good. Specjbb2015 test with zgc gives us anticipated max & critical score. > I can provide more details and propose a webrev for the backport. But before that I would like to hear your comments & suggestions. > > Thanks, > Felix From thomas.schatzl at oracle.com Wed Jan 15 15:05:15 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 15 Jan 2020 16:05:15 +0100 Subject: RFR: 8237201: ZGC: Remove unused ZRelocationSetSelector::fragmentation() In-Reply-To: <35727b46-b4d8-8336-b484-1119bff15468@oracle.com> References: <35727b46-b4d8-8336-b484-1119bff15468@oracle.com> Message-ID: <1a626930-3be4-4976-8738-9f3d716873ce@oracle.com> Hi, On 15.01.20 14:03, Per Liden wrote: > ZRelocationSetSelector::fragmentation() is not used and can be removed. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237201 > Webrev: http://cr.openjdk.java.net/~pliden/8237201/webrev.0 > > /Per looks good. Thomas From thomas.schatzl at oracle.com Wed Jan 15 17:57:49 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 15 Jan 2020 18:57:49 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <693b04b7-d13d-4ef5-b425-febc81984dbc.maoliang.ml@alibaba-inc.com> References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com> ,<9d21b384-790b-c2e2-d801-0025c9257656@oracle.com> <693b04b7-d13d-4ef5-b425-febc81984dbc.maoliang.ml@alibaba-inc.com> Message-ID: Hi, On Wed, 2020-01-15 at 20:53 +0800, Liang Mao wrote: > Hi Thomas, > > So G1 doesn't need to shrink below Xms if SoftMaxHeapSize is > below Xms, does it? > No, never shrink below MinHeapSize. > Another question is that no matter we have an additional option we > had better have 2 criterions. The first is for urgent expansion that > GCTimeRatio is quite low and concurrent expansion with frequent GCs > is more harmful and expansion should be done immediately. It's the > current default flow as we found that 12 is a good number below which > applications can obviously incur timeout errors. The second is to > keep the GCTimeRatio and memory footprint in a balanced state so > any adjustments are better to be concurrent. The original number 99 > fits well here. If we have only one option "GCTimeRatio", we might > not be able to achieve both. Maybe we can have a LowGCTimeRatio below > which suppose to be not acceptable and a HighTimeRatio which is > certainly healthy. So far the change has been about shrinking the heap concurrently, and not expansion. Let's concentrate on the issue at hand, i.e. see how heap shrinking at more places turns out. I believe there will be lots of tweaking needed for this change to not show too many regressions in other applications. Remember that the defaults should work well for a large body of applications, not just a few. There may be knobs to tune it for others. Then look concurrent expansion, at application phase changes in the application, how to detect, and how to react best. Just for reference, last time we changed the sizing algorithm it took a few months to get it "right", with mostly improvements all around. Thanks, Thomas From aph at redhat.com Wed Jan 15 18:03:54 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 15 Jan 2020 18:03:54 +0000 Subject: [aarch64-port-dev ] [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: References: Message-ID: <6b716737-ba03-71c2-5488-b5654093c447@redhat.com> On 1/15/20 1:10 PM, Stuart Monteith wrote: > I'm pleased that there is interest in ZGC on aarch64, that the > performance is at expected levels and is apparently trouble free. > However, I'd like to understand why this backporting is being done. If > it is for running in production, then I'd expect Per, etc, to not be > upset or disagree when I that ZGC on aarch64 in JDK 13 isn't > production ready. In particular, it's perhaps odd that something which is still an experimental feature in mainline is being considered for a backport. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Jan 15 18:48:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 15 Jan 2020 19:48:28 +0100 Subject: RFR (T) 8237217: Incorrect G1StringDedupEntry type used in StringDedupTable destructor Message-ID: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8237217 Spotted this when reading the strdedup code. This is a trivial leftover from JDK-8203641. G1StringDedupEntry symbol does not even exist, and the whole thing works because FREE_C_HEAP_ARRAY ignores that parameter. But it should be consistent anyway with constructor anyway. I would not bother with jdk-submit testing, as it looks pretty trivial. Fix: diff -r f7edb9ca045c src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp --- a/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Fri Jan 10 15:38:25 2020 +0100 +++ b/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Wed Jan 15 19:47:47 2020 +0100 @@ -234,11 +234,11 @@ _buckets = NEW_C_HEAP_ARRAY(StringDedupEntry*, _size, mtGC); memset(_buckets, 0, _size * sizeof(StringDedupEntry*)); } StringDedupTable::~StringDedupTable() { - FREE_C_HEAP_ARRAY(G1StringDedupEntry*, _buckets); + FREE_C_HEAP_ARRAY(StringDedupEntry*, _buckets); } Testing: x86_64 fastdebug build; hotspot_gc_shenandoah -- Thanks, -Aleksey From kim.barrett at oracle.com Wed Jan 15 18:51:03 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 15 Jan 2020 13:51:03 -0500 Subject: RFR (T) 8237217: Incorrect G1StringDedupEntry type used in StringDedupTable destructor In-Reply-To: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> References: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> Message-ID: > On Jan 15, 2020, at 1:48 PM, Aleksey Shipilev wrote: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8237217 > > Spotted this when reading the strdedup code. This is a trivial leftover from JDK-8203641. > G1StringDedupEntry symbol does not even exist, and the whole thing works because FREE_C_HEAP_ARRAY > ignores that parameter. But it should be consistent anyway with constructor anyway. > > I would not bother with jdk-submit testing, as it looks pretty trivial. Looks good, and trivial. > > Fix: > > diff -r f7edb9ca045c src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp > --- a/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Fri Jan 10 15:38:25 2020 +0100 > +++ b/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Wed Jan 15 19:47:47 2020 +0100 > @@ -234,11 +234,11 @@ > _buckets = NEW_C_HEAP_ARRAY(StringDedupEntry*, _size, mtGC); > memset(_buckets, 0, _size * sizeof(StringDedupEntry*)); > } > > StringDedupTable::~StringDedupTable() { > - FREE_C_HEAP_ARRAY(G1StringDedupEntry*, _buckets); > + FREE_C_HEAP_ARRAY(StringDedupEntry*, _buckets); > } > > Testing: x86_64 fastdebug build; hotspot_gc_shenandoah > > -- > Thanks, > -Aleksey From zgu at redhat.com Wed Jan 15 18:51:50 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 15 Jan 2020 13:51:50 -0500 Subject: RFR (T) 8237217: Incorrect G1StringDedupEntry type used in StringDedupTable destructor In-Reply-To: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> References: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> Message-ID: <161f8838-c3f7-e718-d72e-8990809bed26@redhat.com> Good and trivial. -Zhengyu On 1/15/20 1:48 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8237217 > > Spotted this when reading the strdedup code. This is a trivial leftover from JDK-8203641. > G1StringDedupEntry symbol does not even exist, and the whole thing works because FREE_C_HEAP_ARRAY > ignores that parameter. But it should be consistent anyway with constructor anyway. > > I would not bother with jdk-submit testing, as it looks pretty trivial. > > Fix: > > diff -r f7edb9ca045c src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp > --- a/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Fri Jan 10 15:38:25 2020 +0100 > +++ b/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Wed Jan 15 19:47:47 2020 +0100 > @@ -234,11 +234,11 @@ > _buckets = NEW_C_HEAP_ARRAY(StringDedupEntry*, _size, mtGC); > memset(_buckets, 0, _size * sizeof(StringDedupEntry*)); > } > > StringDedupTable::~StringDedupTable() { > - FREE_C_HEAP_ARRAY(G1StringDedupEntry*, _buckets); > + FREE_C_HEAP_ARRAY(StringDedupEntry*, _buckets); > } > > Testing: x86_64 fastdebug build; hotspot_gc_shenandoah > From shade at redhat.com Wed Jan 15 19:05:11 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 15 Jan 2020 20:05:11 +0100 Subject: RFR (T) 8237217: Incorrect G1StringDedupEntry type used in StringDedupTable destructor In-Reply-To: References: <1a60a3d0-7329-b919-cd3d-3dcddfa50b8e@redhat.com> Message-ID: <506fa331-792c-8550-a546-2ed1a8d1f279@redhat.com> On 1/15/20 7:51 PM, Kim Barrett wrote: >> On Jan 15, 2020, at 1:48 PM, Aleksey Shipilev wrote: >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8237217 >> >> Spotted this when reading the strdedup code. This is a trivial leftover from JDK-8203641. >> G1StringDedupEntry symbol does not even exist, and the whole thing works because FREE_C_HEAP_ARRAY >> ignores that parameter. But it should be consistent anyway with constructor anyway. >> >> I would not bother with jdk-submit testing, as it looks pretty trivial. > > Looks good, and trivial. Thanks, pushed. -- Thanks, -Aleksey From shade at redhat.com Wed Jan 15 19:17:24 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 15 Jan 2020 20:17:24 +0100 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> Message-ID: <6471ea70-e89f-17ef-9585-20f4c16a3e23@redhat.com> On 1/14/20 6:19 PM, Zhengyu Gu wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html It is odd to mix the atomic update and locked update. We can lose locked updates that do not expect anyone to modify the field when lock is held. It is probably fine for _entries_removed, as it is used for statistics. It seems riskier to do for _table->_entries: are we sure nothing in the String dedup table relies on that being very accurate? Can you explain a little bit why we cannot block on StringDedupTable_lock here? Is this a reentrancy issue? -- Thanks, -Aleksey From shade at redhat.com Wed Jan 15 19:50:57 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 15 Jan 2020 20:50:57 +0100 Subject: RFR (XS) 8237223: Shenandoah: important flags should not be ergonomic for concurrent class unloading Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8237223 Fix: diff -r 53b6aad22933 src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp Wed Jan 15 20:04:51 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp Wed Jan 15 20:49:30 2020 +0100 @@ -34,10 +34,11 @@ void ShenandoahNormalMode::initialize_flags() const { + if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { + FLAG_SET_DEFAULT(ShenandoahSuspendibleWorkers, true); + FLAG_SET_DEFAULT(VerifyBeforeExit, false); + } + SHENANDOAH_ERGO_ENABLE_FLAG(ExplicitGCInvokesConcurrent); SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahImplicitGCInvokesConcurrent); - if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { - SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahSuspendibleWorkers); - SHENANDOAH_ERGO_DISABLE_FLAG(VerifyBeforeExit); - } Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From zgu at redhat.com Wed Jan 15 20:03:44 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 15 Jan 2020 15:03:44 -0500 Subject: RFR (XS) 8237223: Shenandoah: important flags should not be ergonomic for concurrent class unloading In-Reply-To: References: Message-ID: Ah, okay. Looks good to me. Thanks, -Zhengyu On 1/15/20 2:50 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8237223 > > Fix: > > diff -r 53b6aad22933 src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp Wed Jan 15 20:04:51 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahNormalMode.cpp Wed Jan 15 20:49:30 2020 +0100 > @@ -34,10 +34,11 @@ > > void ShenandoahNormalMode::initialize_flags() const { > + if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { > + FLAG_SET_DEFAULT(ShenandoahSuspendibleWorkers, true); > + FLAG_SET_DEFAULT(VerifyBeforeExit, false); > + } > + > SHENANDOAH_ERGO_ENABLE_FLAG(ExplicitGCInvokesConcurrent); > SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahImplicitGCInvokesConcurrent); > - if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { > - SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahSuspendibleWorkers); > - SHENANDOAH_ERGO_DISABLE_FLAG(VerifyBeforeExit); > - } > > Testing: hotspot_gc_shenandoah > From zgu at redhat.com Wed Jan 15 21:29:34 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 15 Jan 2020 16:29:34 -0500 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: <6471ea70-e89f-17ef-9585-20f4c16a3e23@redhat.com> References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> <6471ea70-e89f-17ef-9585-20f4c16a3e23@redhat.com> Message-ID: <8eeccdc6-960b-591a-d1b1-42bb50f868ad@redhat.com> On 1/15/20 2:17 PM, Aleksey Shipilev wrote: > On 1/14/20 6:19 PM, Zhengyu Gu wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236878 >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.00/index.html > > It is odd to mix the atomic update and locked update. We can lose locked updates that do not expect > anyone to modify the field when lock is held. It is probably fine for _entries_removed, as it is > used for statistics. It seems riskier to do for _table->_entries: are we sure nothing in the String > dedup table relies on that being very accurate? Atomic update and locked update do not overlap. The counter updates only happens at a safepoint (before this patch) or with StringDedupTable_lock held (with this patch). Added following patch to make it more obvious. diff -r 7ce7d01e68ec src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp --- a/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Wed Jan 15 14:37:34 2020 -0500 +++ b/src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp Wed Jan 15 14:48:21 2020 -0500 @@ -479,6 +479,7 @@ // Delayed update to avoid contention on the table lock if (removed > 0) { + assert_locked_or_safepoint_weak(StringDedupTable_lock); Atomic::sub(&_table->_entries, removed); Atomic::add(&_entries_removed, removed); } > > Can you explain a little bit why we cannot block on StringDedupTable_lock here? Is this a reentrancy > issue? Table rehashing is part of cleanup, and it utilizes workers to perform parallel rehashing, therefore, it needs to use lock or atomic operation to update entry counters from each worker. Concurrent string dedup cleaning task need to take StringDedupTable_lock to avoid modification to the table from mutators, so that, workers can not acquire the lock. Otherwise, deadlock. As far as I know, _table->_entries is only used to make rehashing decision, so there is no blocking requirement. Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.01/index.html Test: Reran hotspot_gc test. Thanks, -Zhengyu > From manc at google.com Thu Jan 16 00:08:46 2020 From: manc at google.com (Man Cao) Date: Wed, 15 Jan 2020 16:08:46 -0800 Subject: Work-in-progress: 8236485: Epoch synchronization protocol for G1 concurrent refinement In-Reply-To: References: Message-ID: We had an offline discussion on this. To keep the community in the loop, here is what we discussed. a. Using Linux membarrier syscall or equivalent on other OSes seems a cleaner solution than thread-local handshake (TLH). But we need to have a backup mechanism for OSes and older Linuxes that do not have such a syscall. b. For the blocking property of TLH, https://bugs.openjdk.java.net/browse/JDK-8230594 may help solve the problem once it is implemented. c. TLH could be issued to a subset of all threads, e.g. only to thread that have not yet reached the global epoch. This could save a lot of time for the handshake. d. Compiler threads are Java threads but they are mostly not in Java state. They could be a source of problem for the epoch synchronization protocol. e. The filter in G1EpochSynchronizer::check_and_update_frontier() may be incorrect, because it racily reads a remote thread's state, which may not observe all threads in Java state. f. Implementing asynchronous processing of the dirty card buffers could avoid a lot of TLH requests, so the speed of TLH may not be hugely concerning. g. It may be OK to slow down the native post-write barrier a bit with more frequent execution of the StoreLoad fence. We could do some benchmarking to test this. A more debatable issue is if we would make the native post-write barrier different from the post-write barrier in Java code, that only the native barrier has the StoreLoad fence. I will further work on these issues. -Man On Sun, Dec 22, 2019 at 8:50 AM Man Cao wrote: > Hi all, > > I have written up a description and challenges for implementing an epoch > synchronization protocol. This protocol is necessary for removing the > StoreLoad fence in G1's post-write barrier (JDK-8226731) > > Description: https://bugs.openjdk.java.net/browse/JDK-8236485 > Work-in-progress webrev: > https://cr.openjdk.java.net/~manc/8236485/webrev_wip0/ > > There are two main challenges that I'm not sure how to resolve: > - Triggering a thread-local handshake is a blocking operation that can > pass a safepoint. > - There are native post-write barriers executed by threads in native/VM > state. > > Discussions and suggestions are highly appreciated! > > -Man > From felix.yang at huawei.com Thu Jan 16 03:01:55 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 16 Jan 2020 03:01:55 +0000 Subject: [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: <38a15dc5-9cee-0f44-13ee-98f185ee72ae@oracle.com> References: <38a15dc5-9cee-0f44-13ee-98f185ee72ae@oracle.com> Message-ID: Hi, > Hi, > > Please note that backporting JDK-8224675 "Late GC barrier insertion for ZGC" > is not great idea, since that patch introduced stability issues and the whole > approach was later superseded by JDK-8230565 "ZGC: Redesign C2 load > barrier to expand on the MachNode level". > > If you want to go down this path, I'd suggest that you either don't backport > JDK-8224675 at all, or backport everything up to JDK-8224675 + JDK-8230565. > Also note that if you include JDK-8230565 you want to be careful to also include > any followup bug fixes, like JDK-8233506. Thanks for pointing this out. It's helpful for our current work. We plan to start with the four patches and will check for other necessary ones. We noticed patches like JDK-8230565 are necessary for x86 zgc, but it's not there in jdk11. Users who want to stay with LTS versions like jdk11 will most likely come to the problems when they try zgc on the x86 platform. Is there a plan to incorporate these patches in jdk11? > In general, a lot of stability and performance improvements have gone into ZGC > since JDK 11. If at all possible, I would strongly recommend using JDK 14 > instead, where you already have aarch64 support and all other goodies. Does that mean zgc in jdk11 will not be maintained by the community? Thanks, Felix From maoliang.ml at alibaba-inc.com Thu Jan 16 03:21:13 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 16 Jan 2020 11:21:13 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: References: <5b24e235-5466-15a1-78a6-6f63bfa1878e@oracle.com> <43090624-d8be-8600-a55e-1e10b1920135@oracle.com> <359fbef8-6735-4958-b76f-56430f1a4108.maoliang.ml@alibaba-inc.com> <36b31e9a-ee86-50d9-8042-bc79e6756777@oracle.com> <08e025f0a9520b12d06df8157d63d73b4e7e11a4.camel@oracle.com> <4fe7f936-95bb-4a4e-85ed-e5c4423f9d06.maoliang.ml@alibaba-inc.com> , <9d21b384-790b-c2e2-d801-0025c9257656@oracle.com> <693b04b7-d13d-4ef5-b425-febc81984dbc.maoliang.ml@alibaba-inc.com>, Message-ID: Hi Thomas, Yes. We can focus on the current concurrent shrinking for now. You are right that changing the default behavior will be sensitive since you need to cover all types of applications including throughput and low-latency while our previous patch is mostly designed for low-latency. We'll figure this out later:) Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 16 (Thu.) 01:57 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On Wed, 2020-01-15 at 20:53 +0800, Liang Mao wrote: > Hi Thomas, > > So G1 doesn't need to shrink below Xms if SoftMaxHeapSize is > below Xms, does it? > No, never shrink below MinHeapSize. > Another question is that no matter we have an additional option we > had better have 2 criterions. The first is for urgent expansion that > GCTimeRatio is quite low and concurrent expansion with frequent GCs > is more harmful and expansion should be done immediately. It's the > current default flow as we found that 12 is a good number below which > applications can obviously incur timeout errors. The second is to > keep the GCTimeRatio and memory footprint in a balanced state so > any adjustments are better to be concurrent. The original number 99 > fits well here. If we have only one option "GCTimeRatio", we might > not be able to achieve both. Maybe we can have a LowGCTimeRatio below > which suppose to be not acceptable and a HighTimeRatio which is > certainly healthy. So far the change has been about shrinking the heap concurrently, and not expansion. Let's concentrate on the issue at hand, i.e. see how heap shrinking at more places turns out. I believe there will be lots of tweaking needed for this change to not show too many regressions in other applications. Remember that the defaults should work well for a large body of applications, not just a few. There may be knobs to tune it for others. Then look concurrent expansion, at application phase changes in the application, how to detect, and how to react best. Just for reference, last time we changed the sizing algorithm it took a few months to get it "right", with mostly improvements all around. Thanks, Thomas From kim.barrett at oracle.com Thu Jan 16 05:57:17 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 16 Jan 2020 00:57:17 -0500 Subject: RFR: 8237261: Concurrent refinement activation threshold not updated for card counts Message-ID: Please review this change to the activation threshold for the primary (first) concurrent refinement thread. The special calculation used for that thread's threshold wasn't updated to account for the change from using buffer counts to using counts of the cards in the buffers by JDK-8230109. Also fixed a parameter name that wasn't updated by that same change from buffer counts to card counts. CR: https://bugs.openjdk.java.net/browse/JDK-8237261 Webrev: https://cr.openjdk.java.net/~kbarrett/8237261/open.00/ Testing: mach5 tier1 by itself mach5 tier1-5 and some perf testing with in development change for JDK-8237143 From felix.yang at huawei.com Thu Jan 16 06:42:29 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 16 Jan 2020 06:42:29 +0000 Subject: [aarch64-port-dev ] [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: <6b716737-ba03-71c2-5488-b5654093c447@redhat.com> References: <6b716737-ba03-71c2-5488-b5654093c447@redhat.com> Message-ID: Hi, > On 1/15/20 1:10 PM, Stuart Monteith wrote: > > I'm pleased that there is interest in ZGC on aarch64, that the > > performance is at expected levels and is apparently trouble free. > > However, I'd like to understand why this backporting is being done. If > > it is for running in production, then I'd expect Per, etc, to not be > > upset or disagree when I that ZGC on aarch64 in JDK 13 isn't > > production ready. > > In particular, it's perhaps odd that something which is still an experimental > feature in mainline is being considered for a backport. So long as zgc in jdk11 is continually maintained by the community, it may not be a bad idea to enable it on one more arch provided that the risk is acceptable after code review. Otherwise, we are on our own. Users are much conservative when it comes to migrating to a new jdk version. Also they have their own decision about which GC policy to use. Thanks, Felix From per.liden at oracle.com Thu Jan 16 08:22:33 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 16 Jan 2020 09:22:33 +0100 Subject: [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: References: <38a15dc5-9cee-0f44-13ee-98f185ee72ae@oracle.com> Message-ID: Hi, ZGC in JDK 11 is fairly stable as it is, so there's no super compelling reason to spend time and resources on backporting JDK-8233506 at this time. However, backporting only JDK-8224675 would be a mistake, as it would destabilize ZGC (including the x86 port) so you would basically have to go all the way to JDK-8230565, or alternatively don't backport JDK-8224675 and adjust the aarch64 port accordingly. Whatever path you take here, it would require significant work and testing, which is why I'd again recommend that you to consider using JDK 14 (when it's GA) for these workloads. cheers, Per On 1/16/20 4:01 AM, Yangfei (Felix) wrote: > Hi, > >> Hi, >> >> Please note that backporting JDK-8224675 "Late GC barrier insertion for ZGC" >> is not great idea, since that patch introduced stability issues and the whole >> approach was later superseded by JDK-8230565 "ZGC: Redesign C2 load >> barrier to expand on the MachNode level". >> >> If you want to go down this path, I'd suggest that you either don't backport >> JDK-8224675 at all, or backport everything up to JDK-8224675 + JDK-8230565. >> Also note that if you include JDK-8230565 you want to be careful to also include >> any followup bug fixes, like JDK-8233506. > > Thanks for pointing this out. It's helpful for our current work. > We plan to start with the four patches and will check for other necessary ones. > We noticed patches like JDK-8230565 are necessary for x86 zgc, but it's not there in jdk11. > Users who want to stay with LTS versions like jdk11 will most likely come to the problems when they try zgc on the x86 platform. > Is there a plan to incorporate these patches in jdk11? > >> In general, a lot of stability and performance improvements have gone into ZGC >> since JDK 11. If at all possible, I would strongly recommend using JDK 14 >> instead, where you already have aarch64 support and all other goodies. > > Does that mean zgc in jdk11 will not be maintained by the community? > > > Thanks, > Felix > From kim.barrett at oracle.com Thu Jan 16 08:51:17 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 16 Jan 2020 03:51:17 -0500 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon Message-ID: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> Please review this change to eliminate the DirtyCardQ_cbl_mon. This is one of the two remaining super-special "access" ranked mutexes. (The other is the Shared_DirtyCardQ_lock, whose elimination is covered by JDK-8221360.) There are three main parts to this change. (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a lock-free FIFO queue. (2) Replace the use of a HotSpot monitor for signaling activation of concurrent refinement threads with a semaphore-based solution. (3) Handle pausing of buffer refinement in the middle of a buffer in order to handle a pending safepoint request. This can no longer just push the partially processed buffer back onto the queue, due to ABA problems now that the buffer is lock-free. CR: https://bugs.openjdk.java.net/browse/JDK-8237143 Webrev: https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ Testing: mach5 tier1-5 Normal performance testing showed no significant change. specjbb2015 on a very big machine showed a 3.5% average critical-jOPS improvement, though not statistically significant; removing contention for that lock by many hardware threads may be a little bit noticeable. From shade at redhat.com Thu Jan 16 08:51:42 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 16 Jan 2020 09:51:42 +0100 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: <8eeccdc6-960b-591a-d1b1-42bb50f868ad@redhat.com> References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> <6471ea70-e89f-17ef-9585-20f4c16a3e23@redhat.com> <8eeccdc6-960b-591a-d1b1-42bb50f868ad@redhat.com> Message-ID: <74b22231-bb01-0bc3-5707-0a1107065181@redhat.com> On 1/15/20 10:29 PM, Zhengyu Gu wrote: > Updated webrev: > http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.01/index.html OK, thanks for explaining. I guess that makes sense. This comment is outdated then: 480 // Delayed update to avoid contention on the table lock I'd suggest to rewrite it to: // Do atomic update here instead of taking StringDedupTable_lock. This allows concurrent // cleanup when multiple workers are cleaning up the table, while the mutators are blocked // on StringDedupTable_lock. ...or some such. -- Thanks, -Aleksey From per.liden at oracle.com Thu Jan 16 09:24:18 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 16 Jan 2020 10:24:18 +0100 Subject: RFR: 8237201: ZGC: Remove unused ZRelocationSetSelector::fragmentation() In-Reply-To: <1a626930-3be4-4976-8738-9f3d716873ce@oracle.com> References: <35727b46-b4d8-8336-b484-1119bff15468@oracle.com> <1a626930-3be4-4976-8738-9f3d716873ce@oracle.com> Message-ID: <2d04ae6b-c532-fd3a-48a5-62946b554fc3@oracle.com> Thanks for reviewing, Thomas! /Per On 1/15/20 4:05 PM, Thomas Schatzl wrote: > Hi, > > On 15.01.20 14:03, Per Liden wrote: >> ZRelocationSetSelector::fragmentation() is not used and can be removed. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237201 >> Webrev: http://cr.openjdk.java.net/~pliden/8237201/webrev.0 >> >> /Per > > ? looks good. > > Thomas From stefan.johansson at oracle.com Thu Jan 16 10:10:18 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 16 Jan 2020 11:10:18 +0100 Subject: RFR: 8237261: Concurrent refinement activation threshold not updated for card counts In-Reply-To: References: Message-ID: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> Hi Kim, On 2020-01-16 06:57, Kim Barrett wrote: > Please review this change to the activation threshold for the primary > (first) concurrent refinement thread. The special calculation used > for that thread's threshold wasn't updated to account for the change > from using buffer counts to using counts of the cards in the buffers > by JDK-8230109. > > Also fixed a parameter name that wasn't updated by that same change > from buffer counts to card counts. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8237261 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8237261/open.00/ > Looks good, Stefan > Testing: > mach5 tier1 by itself > mach5 tier1-5 and some perf testing with in development change for JDK-8237143 > From stefan.karlsson at oracle.com Thu Jan 16 10:13:53 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 16 Jan 2020 11:13:53 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: References: Message-ID: <60712fdc-b2b6-4ecb-6216-a44b5c4996c2@oracle.com> Looks good. StefanK On 2020-01-15 13:57, Per Liden wrote: > Hi, > > Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, > which aims to de-duplicate some of the multi-mapping code. I've split > the change into three separate patches, the main patch followed by two > patches doing some renaming. > > > 1) The ZBackingFile code was designed to allow platforms to decide if > they want to use heap multi-mapping or some other (possibly HW > supported) scheme. As of today, all our supported platforms do heap > multi-mapping, so there's some degree of code duplication in > ZBackingFile for each platform. This patch moves common multi-mapping > code into ZPhysicalMemoryManager. If we in the future find that we want > to support a platform that doesn't do multi-mapping, then we can > introduce an abstraction for this again. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 > Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 > > > 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is > somewhat misleading on platforms other than Linux. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 > Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 > > > 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light of > JDK-8237199. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 > Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 > > > cheers, > Per From per.liden at oracle.com Thu Jan 16 10:50:12 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 16 Jan 2020 11:50:12 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: <60712fdc-b2b6-4ecb-6216-a44b5c4996c2@oracle.com> References: <60712fdc-b2b6-4ecb-6216-a44b5c4996c2@oracle.com> Message-ID: <02128613-0ced-dc22-d6c1-d9b474063d76@oracle.com> Thanks Stefan! /Per On 1/16/20 11:13 AM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-01-15 13:57, Per Liden wrote: >> Hi, >> >> Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, >> which aims to de-duplicate some of the multi-mapping code. I've split >> the change into three separate patches, the main patch followed by two >> patches doing some renaming. >> >> >> 1) The ZBackingFile code was designed to allow platforms to decide if >> they want to use heap multi-mapping or some other (possibly HW >> supported) scheme. As of today, all our supported platforms do heap >> multi-mapping, so there's some degree of code duplication in >> ZBackingFile for each platform. This patch moves common multi-mapping >> code into ZPhysicalMemoryManager. If we in the future find that we >> want to support a platform that doesn't do multi-mapping, then we can >> introduce an abstraction for this again. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 >> Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 >> >> >> 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is >> somewhat misleading on platforms other than Linux. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 >> Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 >> >> >> 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light >> of JDK-8237199. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 >> Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 >> >> >> cheers, >> Per From thomas.schatzl at oracle.com Thu Jan 16 11:37:17 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 16 Jan 2020 12:37:17 +0100 Subject: RFR: 8237261: Concurrent refinement activation threshold not updated for card counts In-Reply-To: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> References: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> Message-ID: <9fb2208e-677c-db88-b8c7-641c11a24405@oracle.com> Hi, On 16.01.20 11:10, Stefan Johansson wrote: > Hi Kim, > > On 2020-01-16 06:57, Kim Barrett wrote: >> Please review this change to the activation threshold for the primary >> (first) concurrent refinement thread.? The special calculation used >> for that thread's threshold wasn't updated to account for the change >> from using buffer counts to using counts of the cards in the buffers >> by JDK-8230109. >> >> Also fixed a parameter name that wasn't updated by that same change >> from buffer counts to card counts. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8237261 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8237261/open.00/ >> > Looks good, > Stefan +1 Thomas From fweimer at redhat.com Thu Jan 16 12:06:42 2020 From: fweimer at redhat.com (Florian Weimer) Date: Thu, 16 Jan 2020 13:06:42 +0100 Subject: Work-in-progress: 8236485: Epoch synchronization protocol for G1 concurrent refinement In-Reply-To: (Man Cao's message of "Wed, 15 Jan 2020 16:08:46 -0800") References: Message-ID: <87blr3vc4t.fsf@oldenburg2.str.redhat.com> * Man Cao: > We had an offline discussion on this. To keep the community in the loop, > here is what we discussed. > > a. Using Linux membarrier syscall or equivalent on other OSes seems a > cleaner solution than thread-local handshake (TLH). But we need to have a > backup mechanism for OSes and older Linuxes that do not have such a > syscall. Can you do with a membarrier call that doesn't require registration? The usual fallback for membarrier is sending a special signal to all threads, and make sure that they have run code in a signal handler (possibly using a CPU barrier there). But of course this is rather slow. membarrier has seen some backporting activity, but as far as I can see, that hasn't been consistent across architectures. Thanks, Florian From thomas.schatzl at oracle.com Thu Jan 16 13:20:15 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 16 Jan 2020 14:20:15 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation Message-ID: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> Hi all, can I get reviews for this change that fixes a bug in the abortable mixed gc algorithm where G1 might corrupt oops embedded in nmethods due to parallel modification during an optional evacuation phase? G1 currently collects embedded oops in nmethods twice: once in the optional roots list, and once as nmethods in the strong code roots list for a particular region. Now it can happen that this oop embedded in in the code stream is unaligned, so if that oop is modified during relocation word tearing may occur, causing follow-up crashes. The fix is to not collect oops from nmethods in the optional code root list as the strong code root list for a particular region already always contains it anyway. Thanks go to stefank, eriko and sjohanss for helping with analyzing, testing and the discussion around it. CR: https://bugs.openjdk.java.net/browse/JDK-8235305 Webrev: http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ Testing: multiple runs of hs-tier1-5, multiple runs of the crashing application (24h kitchensink) with and without a VM modification and also with some G1 settings that caused crashes within 1-2 hours that reproduced the issue within 5 minutes. Currently starting perf test runs with and without this change: however since this change strictly reduces the work done at all times I am not expecting any regressions (and hence I am asking for review in advance). Thanks, Thomas From stuart.monteith at linaro.org Thu Jan 16 13:24:28 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Thu, 16 Jan 2020 13:24:28 +0000 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: References: Message-ID: Looks good to me, thanks. Stuart On Wed, 15 Jan 2020 at 12:57, Per Liden wrote: > > Hi, > > Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, > which aims to de-duplicate some of the multi-mapping code. I've split > the change into three separate patches, the main patch followed by two > patches doing some renaming. > > > 1) The ZBackingFile code was designed to allow platforms to decide if > they want to use heap multi-mapping or some other (possibly HW > supported) scheme. As of today, all our supported platforms do heap > multi-mapping, so there's some degree of code duplication in > ZBackingFile for each platform. This patch moves common multi-mapping > code into ZPhysicalMemoryManager. If we in the future find that we want > to support a platform that doesn't do multi-mapping, then we can > introduce an abstraction for this again. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 > Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 > > > 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is > somewhat misleading on platforms other than Linux. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 > Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 > > > 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light of > JDK-8237199. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 > Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 > > > cheers, > Per From per.liden at oracle.com Thu Jan 16 13:31:34 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 16 Jan 2020 14:31:34 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: References: Message-ID: <06c09f7f-a07a-b6ce-41f3-fa148f3af1c7@oracle.com> Thanks for reviewing, Stuart! cheers, Per On 1/16/20 2:24 PM, Stuart Monteith wrote: > Looks good to me, thanks. > > Stuart > > On Wed, 15 Jan 2020 at 12:57, Per Liden wrote: >> >> Hi, >> >> Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, >> which aims to de-duplicate some of the multi-mapping code. I've split >> the change into three separate patches, the main patch followed by two >> patches doing some renaming. >> >> >> 1) The ZBackingFile code was designed to allow platforms to decide if >> they want to use heap multi-mapping or some other (possibly HW >> supported) scheme. As of today, all our supported platforms do heap >> multi-mapping, so there's some degree of code duplication in >> ZBackingFile for each platform. This patch moves common multi-mapping >> code into ZPhysicalMemoryManager. If we in the future find that we want >> to support a platform that doesn't do multi-mapping, then we can >> introduce an abstraction for this again. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 >> Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 >> >> >> 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is >> somewhat misleading on platforms other than Linux. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 >> Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 >> >> >> 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light of >> JDK-8237199. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 >> Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 >> >> >> cheers, >> Per From erik.osterlund at oracle.com Thu Jan 16 14:44:36 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 16 Jan 2020 15:44:36 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: References: Message-ID: <47579bf7-e727-6233-c821-82d9fb9bdfba@oracle.com> Hi Per, I like the red stuff. +1 /Erik On 1/15/20 1:57 PM, Per Liden wrote: > Hi, > > Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, > which aims to de-duplicate some of the multi-mapping code. I've split > the change into three separate patches, the main patch followed by two > patches doing some renaming. > > > 1) The ZBackingFile code was designed to allow platforms to decide if > they want to use heap multi-mapping or some other (possibly HW > supported) scheme. As of today, all our supported platforms do heap > multi-mapping, so there's some degree of code duplication in > ZBackingFile for each platform. This patch moves common multi-mapping > code into ZPhysicalMemoryManager. If we in the future find that we > want to support a platform that doesn't do multi-mapping, then we can > introduce an abstraction for this again. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 > Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 > > > 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is > somewhat misleading on platforms other than Linux. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 > Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 > > > 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light > of JDK-8237199. > Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 > Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 > > > cheers, > Per From per.liden at oracle.com Thu Jan 16 15:44:50 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 16 Jan 2020 16:44:50 +0100 Subject: RFR: 8237198+8237199+8237200: ZGC: Share heap multi-mapping code across platforms In-Reply-To: <47579bf7-e727-6233-c821-82d9fb9bdfba@oracle.com> References: <47579bf7-e727-6233-c821-82d9fb9bdfba@oracle.com> Message-ID: <1a06d55a-28b9-d00e-3d78-c81d9816be97@oracle.com> Thanks Erik! /Per On 1/16/20 3:44 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > I like the red stuff. +1 > > /Erik > > On 1/15/20 1:57 PM, Per Liden wrote: >> Hi, >> >> Please review this cleanup of the ZPhysicalMemory/ZBackingFile layer, >> which aims to de-duplicate some of the multi-mapping code. I've split >> the change into three separate patches, the main patch followed by two >> patches doing some renaming. >> >> >> 1) The ZBackingFile code was designed to allow platforms to decide if >> they want to use heap multi-mapping or some other (possibly HW >> supported) scheme. As of today, all our supported platforms do heap >> multi-mapping, so there's some degree of code duplication in >> ZBackingFile for each platform. This patch moves common multi-mapping >> code into ZPhysicalMemoryManager. If we in the future find that we >> want to support a platform that doesn't do multi-mapping, then we can >> introduce an abstraction for this again. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237198 >> Webrev: http://cr.openjdk.java.net/~pliden/8237198/webrev.0 >> >> >> 2) Rename ZBackingFile to ZPhysicalMemoryBacking, since "File" is >> somewhat misleading on platforms other than Linux. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237199 >> Webrev: http://cr.openjdk.java.net/~pliden/8237199/webrev.0 >> >> >> 3) Rename ZBackingPath to ZMountPoint, as it's a better name in light >> of JDK-8237199. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237200 >> Webrev: http://cr.openjdk.java.net/~pliden/8237200/webrev.0 >> >> >> cheers, >> Per > From leo.korinth at oracle.com Thu Jan 16 16:06:29 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Thu, 16 Jan 2020 17:06:29 +0100 Subject: RFR (M): 8235860: Obsolete the UseParallelOldGC option In-Reply-To: <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> References: <292ab94f-f2c8-b373-d5a5-46a45470540e@oracle.com> <2A4B1955-26D5-4544-B476-6E9E5E8009D4@oracle.com> <5e21e50d-a026-98ba-d03d-3f7aa1c31e21@oracle.com> <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> Message-ID: <84170868-4b43-c578-f134-bb169c4f2708@oracle.com> Hi! I believe _name and old_gen_name() in PSOldGen should be removed and the virtual name() should return the string literal directly. Change this if you want. With or without using my suggestions, your changes looks good to me. Thanks for cleaning this! Leo On 07/01/2020 11:55, Thomas Schatzl wrote: > Hi Kim, > > On 18.12.19 16:45, Kim Barrett wrote: >> >> >>> On Dec 18, 2019, at 4:52 AM, Thomas Schatzl >>> wrote: >>> >>> Fixed in >>> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.0_to_1 (diff) >>> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.1 (full) >> >> Looks good. >> > > Thanks for your review. > >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/parallel/psParallelCompact.hpp >>>> Pre-existing: It seems like the big block comment before SplitInfo >>>> should have received some updates as part of the recent shadow-region >>>> patch, but it wasn't touched. >>>> ------------------------------------------------------------------------------ >>>> >>> >>> I am filing a CR for that. >> >> The comment before PSParallelCompact in the same file might also need >> some updating. >> >> (I was a bit confused in my earlier review about where the relevant >> comments were.) >> > > ? I filed JDK-8141637 before the holidays. I added your recent comment. > > Thanks, > ? Thomas From zgu at redhat.com Thu Jan 16 19:08:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 16 Jan 2020 14:08:52 -0500 Subject: [14] RFR 8237369: Shenandoah: failed vmTestbase/nsk/jvmti/AttachOnDemand/attach021/TestDescription.java test Message-ID: Please review this small patch. keep_alive is only applicable during marking phase. Bug: https://bugs.openjdk.java.net/browse/JDK-8237369 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237369/webrev.00/ Test: hotspot_gc_shenandoah vmTestbase/nsk/jvmti/AttachOnDemand/attach021/TestDescription.java with Shenandoah GC (normal and traversal mode) Thank, -Zhengyu From rkennke at redhat.com Thu Jan 16 19:22:30 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 16 Jan 2020 20:22:30 +0100 Subject: [14] RFR 8237369: Shenandoah: failed vmTestbase/nsk/jvmti/AttachOnDemand/attach021/TestDescription.java test In-Reply-To: References: Message-ID: <7f64ce2c-721b-90be-05ac-12386b7a55e8@redhat.com> Yes, good catch! Roman > Please review this small patch. keep_alive is only applicable during > marking phase. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237369 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237369/webrev.00/ > > Test: > ? hotspot_gc_shenandoah > > ? vmTestbase/nsk/jvmti/AttachOnDemand/attach021/TestDescription.java > ? with Shenandoah GC (normal and traversal mode) > > > Thank, > > -Zhengyu > From kim.barrett at oracle.com Thu Jan 16 19:53:45 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 16 Jan 2020 14:53:45 -0500 Subject: RFR: 8237261: Concurrent refinement activation threshold not updated for card counts In-Reply-To: <9fb2208e-677c-db88-b8c7-641c11a24405@oracle.com> References: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> <9fb2208e-677c-db88-b8c7-641c11a24405@oracle.com> Message-ID: > On Jan 16, 2020, at 6:37 AM, Thomas Schatzl wrote: > > Hi, > > On 16.01.20 11:10, Stefan Johansson wrote: >> Hi Kim, >> On 2020-01-16 06:57, Kim Barrett wrote: >>> Please review this change to the activation threshold for the primary >>> (first) concurrent refinement thread. The special calculation used >>> for that thread's threshold wasn't updated to account for the change >>> from using buffer counts to using counts of the cards in the buffers >>> by JDK-8230109. >>> >>> Also fixed a parameter name that wasn't updated by that same change >>> from buffer counts to card counts. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8237261 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8237261/open.00/ >>> >> Looks good, >> Stefan > > +1 > > Thomas Thanks. From kim.barrett at oracle.com Thu Jan 16 19:53:35 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 16 Jan 2020 14:53:35 -0500 Subject: RFR: 8237261: Concurrent refinement activation threshold not updated for card counts In-Reply-To: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> References: <1f213c0e-5425-e06d-836e-4770bb7596f4@oracle.com> Message-ID: > On Jan 16, 2020, at 5:10 AM, Stefan Johansson wrote: > On 2020-01-16 06:57, Kim Barrett wrote: >> Please review this change to the activation threshold for the primary >> (first) concurrent refinement thread. The special calculation used >> for that thread's threshold wasn't updated to account for the change >> from using buffer counts to using counts of the cards in the buffers >> by JDK-8230109. >> Also fixed a parameter name that wasn't updated by that same change >> from buffer counts to card counts. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8237261 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8237261/open.00/ > Looks good, > Stefan Thanks. From zgu at redhat.com Thu Jan 16 20:21:38 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 16 Jan 2020 15:21:38 -0500 Subject: [14] RFR 8237392: Shenandoah: Remove unreliable assertion Message-ID: Offline discussion concluded that the assertion added by JDK-8237369 is not reliable. For piggyback reference updating cycle, has_forwarded_objects flag is carried into next GC cycle and Shenandoah resets marking bitmap concurrently just before new GC cycle. So, there is a short period without reliable marking bitmap, could trigger false assertion. Bug: https://bugs.openjdk.java.net/browse/JDK-8237392 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237392/webrev.00/ Test: hotspot_gc_shenandoah Thanks, -Zhengyu From shade at redhat.com Thu Jan 16 20:29:39 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 16 Jan 2020 21:29:39 +0100 Subject: [14] RFR 8237392: Shenandoah: Remove unreliable assertion In-Reply-To: References: Message-ID: <43b72cb9-d7b9-835e-8a76-5e03d3ce4259@redhat.com> On 1/16/20 9:21 PM, Zhengyu Gu wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8237392 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237392/webrev.00/ Looks good. -- Thanks, -Aleksey From zgu at redhat.com Thu Jan 16 23:37:48 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 16 Jan 2020 18:37:48 -0500 Subject: [15] RFR 8236878: Use atomic instruction to update StringDedupTable's entries and entries_removed counters In-Reply-To: <74b22231-bb01-0bc3-5707-0a1107065181@redhat.com> References: <97d04872-7abb-396d-7552-f85b4cf1b97b@redhat.com> <6471ea70-e89f-17ef-9585-20f4c16a3e23@redhat.com> <8eeccdc6-960b-591a-d1b1-42bb50f868ad@redhat.com> <74b22231-bb01-0bc3-5707-0a1107065181@redhat.com> Message-ID: <155fc21c-f6ca-a680-a681-c6d11482c34e@redhat.com> On 1/16/20 3:51 AM, Aleksey Shipilev wrote: > On 1/15/20 10:29 PM, Zhengyu Gu wrote: >> Updated webrev: >> http://cr.openjdk.java.net/~zgu/JDK-8236878/webrev.01/index.html > > OK, thanks for explaining. I guess that makes sense. > > This comment is outdated then: > 480 // Delayed update to avoid contention on the table lock > > I'd suggest to rewrite it to: > // Do atomic update here instead of taking StringDedupTable_lock. This allows concurrent > // cleanup when multiple workers are cleaning up the table, while the mutators are blocked > // on StringDedupTable_lock. Updated as you suggested and pushed. Thanks, -Zhengyu > > ...or some such. > From manc at google.com Fri Jan 17 00:53:04 2020 From: manc at google.com (Man Cao) Date: Thu, 16 Jan 2020 16:53:04 -0800 Subject: Discussion: improve humongous objects handling for G1 Message-ID: Hi all, While migrating our workload from CMS to G1, we found many production applications suffer from humongous allocations. The default threshold for humongous objects is often too small for our applications with heap sizes between 2GB-15GB. Humongous allocations caused noticeable increase in the frequency of concurrent old-gen collections, mixed collections and CPU usage. We could advise applications to increase G1HeapRegionSize. But some applications still suffer with G1HeapRegionSize=32M. We could also advise applications to refactor code to break down large objects. But it is a high cost effort that may not always be feasible. We'd like to work with the OpenJDK community together to improve G1's handling of humongous objects. Thomas Schatzl mentioned to me a few efforts/ideas on this front in an offline chat: a. Allocation into tail regions of humongous object: JDK-8172713, JDK-8031381 b. Commit additional virtual address space for humongous objects. c. Improve the region selection heuristics (e.g., first-fit, best-fit) for humongous objects. I didn't find open CRs for b. and c. Could someone give pointers? Are there any other ideas/prototypes on this front? -Man From felix.yang at huawei.com Fri Jan 17 08:00:24 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 17 Jan 2020 08:00:24 +0000 Subject: [RFC] ZGC proposal for aarch64 jdk11u In-Reply-To: References: <38a15dc5-9cee-0f44-13ee-98f185ee72ae@oracle.com> Message-ID: Hi, > ZGC in JDK 11 is fairly stable as it is, so there's no super compelling reason to > spend time and resources on backporting JDK-8233506 at this time. However, > backporting only JDK-8224675 would be a mistake, as it would destabilize ZGC > (including the x86 port) so you would basically have to go all the way to > JDK-8230565, or alternatively don't backport > JDK-8224675 and adjust the aarch64 port accordingly. Yes, we see your point here and will look into it. > Whatever path you take here, it would require significant work and testing, > which is why I'd again recommend that you to consider using JDK > 14 (when it's GA) for these workloads. Thanks again for your helpful comments. We will consider it when people are willing to switch to higher jdk versions. Best regards, Felix From shade at redhat.com Fri Jan 17 08:54:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 17 Jan 2020 09:54:23 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: References: Message-ID: <30800755-ca6b-bae2-98a5-e0be08c67166@redhat.com> On 1/17/20 1:53 AM, Man Cao wrote: > a. Allocation into tail regions of humongous object: JDK-8172713, JDK-8031381 Caveat: allocations near the "grandfather" humongous object would probably enjoy lots of nepotism. > b. Commit additional virtual address space for humongous objects. Caveat: users do like us not going over -Xmx! So this thing is better to be inside the "actual" heap. > c. Improve the region selection heuristics (e.g., first-fit, best-fit) for > humongous objects. That works for solving external fragmentation (splitting the free space with a humongous alloc, for next humongous alloc to not fit), right? Not the internal fragmentation (unused tail in the region). > Are there any other ideas/prototypes on this front? In Shenandoah, we found that compacting humongous regions, at least at Full GC, makes the collector survive heavy external fragmentation, albeit at grand cost. G1 has the RFE open here: https://bugs.openjdk.java.net/browse/JDK-8191565 I remember J9 people telling me their GCs have "arraylets" that they spread across the regions, and that works well right up to the point you need to do a JNI GetCritical on it. For quite some time, I speculated that carving out the adjustable subset of regions for humongous allocs and doing power-of-two buddy-system allocation there would be a thing to try. But, I have not researched this thing very deeply. -- Thanks, -Aleksey From thomas.schatzl at oracle.com Fri Jan 17 08:55:52 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 09:55:52 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> Message-ID: <6cb6a34c-cb6b-f904-a0af-6e1b160073e1@oracle.com> Hi, On 16.01.20 14:20, Thomas Schatzl wrote: > Hi all, > > ? can I get reviews for this change that fixes a bug in the abortable > mixed gc algorithm where G1 might corrupt oops embedded in nmethods due > to parallel modification during an optional evacuation phase? > > G1 currently collects embedded oops in nmethods twice: once in the > optional roots list, and once as nmethods in the strong code roots list > for a particular region. > > Now it can happen that this oop embedded in in the code stream is > unaligned, so if that oop is modified during relocation word tearing may > occur, causing follow-up crashes. > > The fix is to not collect oops from nmethods in the optional code root > list as the strong code root list for a particular region already always > contains it anyway. > > Thanks go to stefank, eriko and sjohanss for helping with analyzing, > testing and the discussion around it. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8235305 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ > Testing: > multiple runs of hs-tier1-5, multiple runs of the crashing application > (24h kitchensink) with and without a VM modification and also with some > G1 settings that caused crashes within 1-2 hours that reproduced the > issue within 5 minutes. > Currently starting perf test runs with and without this change: however > since this change strictly reduces the work done at all times I am not > expecting any regressions (and hence I am asking for review in advance). > no perf differences as expected. Another hs-tier1-5 completed, and hs-tier6-8 almost done without new issues. Thanks, Thomas From stefan.johansson at oracle.com Fri Jan 17 09:06:50 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 17 Jan 2020 10:06:50 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> Message-ID: <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> Hi Thomas, On 2020-01-16 14:20, Thomas Schatzl wrote: > Hi all, > > ? can I get reviews for this change that fixes a bug in the abortable > mixed gc algorithm where G1 might corrupt oops embedded in nmethods due > to parallel modification during an optional evacuation phase? > > G1 currently collects embedded oops in nmethods twice: once in the > optional roots list, and once as nmethods in the strong code roots list > for a particular region. > > Now it can happen that this oop embedded in in the code stream is > unaligned, so if that oop is modified during relocation word tearing may > occur, causing follow-up crashes. > > The fix is to not collect oops from nmethods in the optional code root > list as the strong code root list for a particular region already always > contains it anyway. > > Thanks go to stefank, eriko and sjohanss for helping with analyzing, > testing and the discussion around it. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8235305 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ Fix looks good. Just some things around the naming of the template parameter and enum after adding this. I don't have a much better idea but I don't think "barrier" is exactly what this is. I do think it would make sense to call the new value G1BarrierNMethod to be more inline with the other names. I also think it would make sense to move the comment about why this is needed to where we use it in g1OopClosures.inline.hpp. Me and StefanK talked a bit about this and if we move the comment and do the check for the barrier as a separate if-statement, it should be more obvious when this is needed. Thanks, Stefan > Testing: > multiple runs of hs-tier1-5, multiple runs of the crashing application > (24h kitchensink) with and without a VM modification and also with some > G1 settings that caused crashes within 1-2 hours that reproduced the > issue within 5 minutes. > Currently starting perf test runs with and without this change: however > since this change strictly reduces the work done at all times I am not > expecting any regressions (and hence I am asking for review in advance). > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Fri Jan 17 10:00:23 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 11:00:23 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: References: Message-ID: <4623ce42-7b6c-8b46-5915-8ff708b82f5c@oracle.com> Hi, On 17.01.20 01:53, Man Cao wrote: > Hi all, > > While migrating our workload from CMS to G1, we found many production > applications suffer from humongous allocations. > The default threshold for humongous objects is often too small for our > applications with heap sizes between 2GB-15GB. > Humongous allocations caused noticeable increase in the frequency of > concurrent old-gen collections, mixed collections and CPU usage. > We could advise applications to increase G1HeapRegionSize. But some > applications still suffer with G1HeapRegionSize=32M. > We could also advise applications to refactor code to break down large > objects. But it is a high cost effort that may not always be feasible. > > We'd like to work with the OpenJDK community together to improve G1's > handling of humongous objects. > Thomas Schatzl mentioned to me a few efforts/ideas on this front in an > offline chat: > a. Allocation into tail regions of humongous object: JDK-8172713, > JDK-8031381 > b. Commit additional virtual address space for humongous objects. > c. Improve the region selection heuristics (e.g., first-fit, best-fit) for > humongous objects. > > I didn't find open CRs for b. and c. Could someone give pointers? > Are there any other ideas/prototypes on this front? TLDR: we in the Oracle gc team have quite a few ideas that can decrease the issue significantly. We are happy to help with implementation of any of these. We would appreciate a sample application. Long version: The problems with humongous object allocation in G1: - internal fragmentation: the tail end of a humongous object is wasted space. - external fragmentation: sometimes you can't find enough contiguous space for a humongous object. There are quite a few CRs related to this problem in the bug tracker; I just now connected them together using a "g1-humongous" label [0]. Here's a rundown of our ideas, categorized a little (note that these CRs predate significant changes due to how G1 works now, so the ideas may need to be adapted to the current situation): - try to get rid of humongous asap, i.e. improve eager reclaim support by allowing eager reclaim with reference arrays (JDK-8048180) or non-objArrays (JDK-8073288). I remember the main problem with that were stale remembered set entries after removal (and SATB marking, but you could just not do eager reclaim during marking). In the applications we had at hand at that time, reference arrays tended to be not eager reclaimable most of the time, and humongous regular objects were rare. So the benefit to look into this might be small. - allow allocation into the tail end of humongous objects (JDK-8172713); there has once been an internal prototype for that, but it has been abandoned because of implementation issues (it was a hack that has not been completed to a stable state, mainly because humongous object management had been full of odd quirks wrt to region management. This has been fixed since. Also the example application benefitted more from eager reclaim). While the argument from Aleksey about nepotism in the other thread is valid (as far as I understand it), it depends on the implementation. The area at the tail end could be considered as a separate evacuation source, i.e. evacuated independently of the humongous object (and that would actually improve the code to clean out HeapRegion ;)). (This needs more care with single-region humongous objects but does not seem completely problematic; single-region humongous objects may nowadays not be a big issue to just move during GC). - external fragmentation can be approached in many ways: - or just ignored by letting G1 reserve a multiple of MaxHeapSize while only ever committing MaxHeapSize (JDK-8229373). The main drawback here is that it impacts the range of heaps where compressed oops can be used, and 32 bit (particularly Windows) VMs (if you still care, but the feature could be disabled as well). Compressed oops typically improve throughput significantly. Of course, as long as the total size of the reservation is below the threshold, it does not really matter. Fwiw, when using the HeterogeneousHeapRegionManager, this is already attempted (for other reasons). - improve the region allocator to decrease the problem (JDK-8229373). The way G1 currently allocates regions is a first-fit approach which interferes a bit with destination region selection for old and survivor regions, likely creating more fragmentation than necessary. (Basically: it does not care at all, so go figure ;) ). Also during mixed gc one could explicitly prefer regions to evacuate that break long runs of free regions, weighing those regions higher (evacuating earlier). This needs to be done in conjunction with the remembered set selection at end of marking, before creating them. Long time ago, on a different regional collector, I started looking into this. - actively defragment the heap during GC. This may either be full gc (JDK-8191565) like shenandoah does, or any young gc assuming that G1 first kept remembered sets for potential candidates (JDK-8038487). - never create humongous objects - potentially implement one of the various ideas in the literature to break down large objects into smaller ones, J9's arraylets being one of them. There are other solutions like completely separate allocation of humongous objects like ZGC does, but that typically has the same problem as reserving more space (i.e. compressed oops range, but ZGC does not care at this time). I think it would help potential contributors if there were some application available where the impact of changes could be shown on in some way. In the past, whenever there had been someone with that problem, these persons were happy to just increase heap region size - which is great for them, but does not fix the problem :) We would in any case help anyone taking a stab of one of these ideas (or others). Thanks, Thomas [0] https://bugs.openjdk.java.net/browse/JDK-8237466?jql=labels%20%3D%20g1-humongous From thomas.schatzl at oracle.com Fri Jan 17 10:26:37 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 11:26:37 +0100 Subject: RFR (M): 8235860: Obsolete the UseParallelOldGC option In-Reply-To: <84170868-4b43-c578-f134-bb169c4f2708@oracle.com> References: <292ab94f-f2c8-b373-d5a5-46a45470540e@oracle.com> <2A4B1955-26D5-4544-B476-6E9E5E8009D4@oracle.com> <5e21e50d-a026-98ba-d03d-3f7aa1c31e21@oracle.com> <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> <84170868-4b43-c578-f134-bb169c4f2708@oracle.com> Message-ID: <657d7f7d-95e5-e355-917a-2c527bac0436@oracle.com> Hi Leo, On 16.01.20 17:06, Leo Korinth wrote: > Hi! > > I believe _name and old_gen_name() in PSOldGen should be removed and the > virtual name() should return the string literal directly. Change this if > you want. > > With or without using my suggestions, your changes looks good to me. thanks for your review. Here are latest changes, fixing the issue: http://cr.openjdk.java.net/~tschatzl/8235860/webrev.1_to_2 (diff) http://cr.openjdk.java.net/~tschatzl/8235860/webrev.2 (full) Thanks, Thomas From leo.korinth at oracle.com Fri Jan 17 12:10:49 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Fri, 17 Jan 2020 13:10:49 +0100 Subject: RFR (M): 8235860: Obsolete the UseParallelOldGC option In-Reply-To: <657d7f7d-95e5-e355-917a-2c527bac0436@oracle.com> References: <292ab94f-f2c8-b373-d5a5-46a45470540e@oracle.com> <2A4B1955-26D5-4544-B476-6E9E5E8009D4@oracle.com> <5e21e50d-a026-98ba-d03d-3f7aa1c31e21@oracle.com> <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> <84170868-4b43-c578-f134-bb169c4f2708@oracle.com> <657d7f7d-95e5-e355-917a-2c527bac0436@oracle.com> Message-ID: <45de9071-a546-4c3b-939f-a7723f5a2cdf@oracle.com> On 17/01/2020 11:26, Thomas Schatzl wrote: > Hi Leo, > > On 16.01.20 17:06, Leo Korinth wrote: >> Hi! >> >> I believe _name and old_gen_name() in PSOldGen should be removed and >> the virtual name() should return the string literal directly. Change >> this if you want. >> >> With or without using my suggestions, your changes looks good to me. > > ? thanks for your review. > > Here are latest changes, fixing the issue: > > http://cr.openjdk.java.net/~tschatzl/8235860/webrev.1_to_2 (diff) > http://cr.openjdk.java.net/~tschatzl/8235860/webrev.2 (full) Looks good! Thanks, Leo > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Fri Jan 17 12:59:31 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 13:59:31 +0100 Subject: RFR (M): 8235860: Obsolete the UseParallelOldGC option In-Reply-To: <45de9071-a546-4c3b-939f-a7723f5a2cdf@oracle.com> References: <292ab94f-f2c8-b373-d5a5-46a45470540e@oracle.com> <2A4B1955-26D5-4544-B476-6E9E5E8009D4@oracle.com> <5e21e50d-a026-98ba-d03d-3f7aa1c31e21@oracle.com> <56A9296B-089E-4A00-9C43-5E3CBDF4A29B@oracle.com> <48af99ac-9e7a-c112-800e-db13e3b3bbcb@oracle.com> <84170868-4b43-c578-f134-bb169c4f2708@oracle.com> <657d7f7d-95e5-e355-917a-2c527bac0436@oracle.com> <45de9071-a546-4c3b-939f-a7723f5a2cdf@oracle.com> Message-ID: <156f1607-b336-3089-d264-ff71fd95ef6f@oracle.com> Hi, On 17.01.20 13:10, Leo Korinth wrote: > On 17/01/2020 11:26, Thomas Schatzl wrote: >> Hi Leo, >> >> On 16.01.20 17:06, Leo Korinth wrote: >>> Hi! >>> >>> I believe _name and old_gen_name() in PSOldGen should be removed and >>> the virtual name() should return the string literal directly. Change >>> this if you want. >>> >>> With or without using my suggestions, your changes looks good to me. >> >> ?? thanks for your review. >> >> Here are latest changes, fixing the issue: >> >> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.1_to_2 (diff) >> http://cr.openjdk.java.net/~tschatzl/8235860/webrev.2 (full) > > Looks good! > > Thanks, > Leo thanks for your review. Thomas From thomas.schatzl at oracle.com Fri Jan 17 13:27:59 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 14:27:59 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> Message-ID: <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> Hi Stefan, On 17.01.20 10:06, Stefan Johansson wrote: > Hi Thomas, > > On 2020-01-16 14:20, Thomas Schatzl wrote: >> Hi all, >> >> ?? can I get reviews for this change that fixes a bug in the abortable >> mixed gc algorithm where G1 might corrupt oops embedded in nmethods >> due to parallel modification during an optional evacuation phase? >> >> G1 currently collects embedded oops in nmethods twice: once in the >> optional roots list, and once as nmethods in the strong code roots >> list for a particular region. >> >> Now it can happen that this oop embedded in in the code stream is >> unaligned, so if that oop is modified during relocation word tearing >> may occur, causing follow-up crashes. >> >> The fix is to not collect oops from nmethods in the optional code root >> list as the strong code root list for a particular region already >> always contains it anyway. >> >> Thanks go to stefank, eriko and sjohanss for helping with analyzing, >> testing and the discussion around it. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8235305 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ > > Fix looks good. Thanks for your review. > Just some things around the naming of the template > parameter and enum after adding this. I don't have a much better idea > [...] Talked to them about this and I'm good with their suggestion: http://cr.openjdk.java.net/~tschatzl/8235305/webrev.1 (full) http://cr.openjdk.java.net/~tschatzl/8235305/webrev.0_to_1 (diff) Thanks, Thomas From stefan.karlsson at oracle.com Fri Jan 17 13:31:05 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 17 Jan 2020 14:31:05 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure Message-ID: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> Hi all, Please review this patch to remove the automatic "is in heap" verification from OopIterateClosure. https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8237363 OopIterateClosure provides some automatic verification that loaded objects are inside the heap. Closures can opt out from this by overriding should_verify_oops(). I propose that we move this verification, and the way to turn it off, and instead let the implementations of the closures decide the kind of verification that is appropriate. I want to do this to de-clutter the closure APIs a bit. I've gone through all OopIterateClosures that don't override should_verify_oops() and added calls to assert_oop_field_points_to_object_in_heap[_or_null] where the closures didn't have equivalent checks. A lot of the places didn't explicitly check that the object is within the heap but they would check for other things like: - Is the corresponding bit index within the range - Is the heap region index within range - Is the object in the reserved heap range (weaker than is_in) I've added asserts to those places. If you think I should remove some of them, please let me now. Tested with tier1-3 Thanks, StefanK From thomas.schatzl at oracle.com Fri Jan 17 14:11:29 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 15:11:29 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" Message-ID: Hi all, can I have reviews for this small test fix to unclutter CI with unnecessary failures? So this attempts test checks the GC cycle, and in the logs this fails because for some unknown reason (timing?) we get to-space exhaustion and ultimately a full gc which prevents the expected mixed gc. The problem (demonstrated with an even more heap-reduced test) is that with 10m heap, 2 regions are already taken by archive regions, leaving 8 regions for allocation. Default policy allows g1 to use 4 regions of eden straight away, meaning that if the right amount of fragmentation occurs, we could expand these 4 eden regions in just a bit more than 4 destination regions, causing the evacuation failure. The fix is to limit young gen size so that this situation can not occur (verified that max number of regions used is significantly smaller than before visually); I added another small fix to not rely on OOME exception to trigger the mixed gcs we want to check for. As I could never locally reproduce the issue with original VM settings, I also added a bit more logging to the runs. I would like to push this into 14 to avoid noise there too as it also occurs there. CR: https://bugs.openjdk.java.net/browse/JDK-8237079 Webrev: http://cr.openjdk.java.net/~tschatzl/8237079/webrev Testing: 4k passed runs with the new test, local testing Thanks, Thomas From stefan.johansson at oracle.com Fri Jan 17 14:41:13 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 17 Jan 2020 15:41:13 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> Message-ID: <54852a22-369a-f6ae-7f4d-bffa1dc89aee@oracle.com> Hi Thomas, On 2020-01-17 14:27, Thomas Schatzl wrote: > Hi Stefan, > > On 17.01.20 10:06, Stefan Johansson wrote: >> Hi Thomas, >> >> On 2020-01-16 14:20, Thomas Schatzl wrote: >>> Hi all, >>> >>> ?? can I get reviews for this change that fixes a bug in the >>> abortable mixed gc algorithm where G1 might corrupt oops embedded in >>> nmethods due to parallel modification during an optional evacuation >>> phase? >>> >>> G1 currently collects embedded oops in nmethods twice: once in the >>> optional roots list, and once as nmethods in the strong code roots >>> list for a particular region. >>> >>> Now it can happen that this oop embedded in in the code stream is >>> unaligned, so if that oop is modified during relocation word tearing >>> may occur, causing follow-up crashes. >>> >>> The fix is to not collect oops from nmethods in the optional code >>> root list as the strong code root list for a particular region >>> already always contains it anyway. >>> >>> Thanks go to stefank, eriko and sjohanss for helping with analyzing, >>> testing and the discussion around it. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8235305 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ >> >> Fix looks good. > > Thanks for your review. > >> Just some things around the naming of the template parameter and enum >> after adding this. I don't have a much better idea > [...] > > Talked to them about this and I'm good with their suggestion: > > http://cr.openjdk.java.net/~tschatzl/8235305/webrev.1 (full) > http://cr.openjdk.java.net/~tschatzl/8235305/webrev.0_to_1 (diff) This looks good! Thanks, Stefan > > Thanks, > ? Thomas > From leo.korinth at oracle.com Fri Jan 17 15:07:19 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Fri, 17 Jan 2020 16:07:19 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: References: Message-ID: Hi Thomas, This is not a review. This code is basically the same code as is duplicated at least three times in the test code. One of the duplications you can blame me for, *sorry*. I believe it should be moved to a common library method. I also believe the last fix you did in TestG1ParallelPhases.java makes that version look cleaner than what you propose here (it does not need the last allocation loop at all). How about using the TestG1ParallelPhases.java version for all three test cases? If not, do the third version in TestOldGenCollectionUsage really work??? Thanks, Leo On 17/01/2020 15:11, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small test fix to unclutter CI with > unnecessary failures? > > So this attempts test checks the GC cycle, and in the logs this fails > because for some unknown reason (timing?) we get to-space exhaustion and > ultimately a full gc which prevents the expected mixed gc. > > The problem (demonstrated with an even more heap-reduced test) is that > with 10m heap, 2 regions are already taken by archive regions, leaving 8 > regions for allocation. Default policy allows g1 to use 4 regions of > eden straight away, meaning that if the right amount of fragmentation > occurs, we could expand these 4 eden regions in just a bit more than 4 > destination regions, causing the evacuation failure. > > The fix is to limit young gen size so that this situation can not occur > (verified that max number of regions used is significantly smaller than > before visually); I added another small fix to not rely on OOME > exception to trigger the mixed gcs we want to check for. > > As I could never locally reproduce the issue with original VM settings, > I also added a bit more logging to the runs. > > I would like to push this into 14 to avoid noise there too as it also > occurs there. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8237079 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8237079/webrev > Testing: > 4k passed runs with the new test, local testing > > Thanks, > ? Thomas From zgu at redhat.com Fri Jan 17 15:28:42 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 17 Jan 2020 10:28:42 -0500 Subject: [14] RFR 8237396: JvmtiTagMap::weak_oops_do() should not trigger barriers Message-ID: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> Please review this small patch that avoids barriers in JvmtiTagMap::weak_oops_do() method. The method is used by GC and GC expects to see raw oops. Bug: https://bugs.openjdk.java.net/browse/JDK-8237396 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237396/webrev.00/ Test: hotspot_gc vmTestbase_nsk_jvmti (fastdebug and release) on x86_64 Linux Submit test in progress. Thanks, -Zhengyu From zgu at redhat.com Fri Jan 17 16:34:30 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 17 Jan 2020 11:34:30 -0500 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase Message-ID: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> Please review this patch that moves string deduplication cleanup task into concurrent phase. The cleanup task composites two subtasks: StringDedupTable and StringDedupQueue cleanup. Concurrent StringDedupTable cleanup is very straightforward. GC takes StringDedupTable_lock to block out mutators from modifying the table, then performs multi-thread cleanup, just as it does at STW pause. Concurrent StringDedupQueue cleanup is more complicated. GC takes StringDedupQueue_lock, only blocks queue structure changes, while mutators can still enqueue new string candidates and dedup thread can still perform deduplication. So there are a couple of synchronizations need to be established. 1) When mutator enqueues a candidate, the enqueued oop should be valid before the slot can be made visible to GC threads. 2) When GC thread updates oop, it needs to make sure that dedup thread does not see partially updated oop. The implementation uses load_acquire/release_store pair to ensure above synchronization held. GC threads may miss some just enqueued oops by mutators. This is not a concern, since LRB guarantees they are in to-space. Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ Test: hotspot_gc_shenandoah with -XX:+UseStringDeduplication (fastdebug and release) on x86_64 and aarch64 Linux Thanks, -Zhengyu From thomas.schatzl at oracle.com Fri Jan 17 16:55:13 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 17 Jan 2020 17:55:13 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: References: Message-ID: <1ddd23c37080bf0069100116e923cee86a3115b1.camel@oracle.com> Hi, On Fri, 2020-01-17 at 16:07 +0100, Leo Korinth wrote: > Hi Thomas, > > This is not a review. This code is basically the same code as is > duplicated at least three times in the test code. One of the > duplications you can blame me for, *sorry*. I believe it should be > moved > to a common library method. I also believe the last fix you did in > TestG1ParallelPhases.java makes that version look cleaner than what > you > propose here (it does not need the last allocation loop at all). I figured that this code looked familiar but did not know where I saw that before. I should have looked through the other tests.... let me look at the other implementations and redo this change. > > How about using the TestG1ParallelPhases.java version for all three > test cases? If not, do the third version in TestOldGenCollectionUsage > really work??? I will check it out. Thomas From manc at google.com Sat Jan 18 04:08:05 2020 From: manc at google.com (Man Cao) Date: Fri, 17 Jan 2020 20:08:05 -0800 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: <4623ce42-7b6c-8b46-5915-8ff708b82f5c@oracle.com> References: <4623ce42-7b6c-8b46-5915-8ff708b82f5c@oracle.com> Message-ID: Thanks for the in-depth responses! For a sample application, I actually have a modified BigRamTester that allocates humongous objects, and it can demonstrate some of the problems. Would JDK-8204689 be addressed soon? Then we can merge the variants of BigRamTester. A possible concern is that the "humongous BigRamTester" is not representative of the production workload's problem with humongous objects. The humongous objects in production workload are more likely short-lived, whereas they are long-lived in "humongous BigRamTester". Perhaps we can modify it further to make it the humongous objects short-lived. I will keep this topic on my radar and see if I can find more realistic benchmarks. For OOMs due to fragmentation and ideas related to full GC (JDK-8191565, JDK-8038487), I'd like to point out that the near-OOM cases are less of a concern for our production applications. Their heap sizes are sufficiently large in order to keep GC overhead low with CMS in the past. When they move to G1, they almost never trigger full GCs even with a non-trivial number of humongous allocations. The problem is the high frequency of concurrent cycles and mixed collections as a result of humongous allocations. Fundamentally it is also due to fragmentation, but only addressing the near-OOM cases would not solve the problem. Doing more active defragmentation could indeed help. It might be better to first fully explore the feasibilities of those crazier ideas. If one of them works, then we don't need to continuously improve G1 here and there. So far there are 3 of them. They all can get rid of humongous regions completely if I understand correctly. a. let G1 reserve a multiple of MaxHeapSize while only ever committing MaxHeapSize (JDK-8229373) I like this approach most, especially since JDK-8211425 is already implemented. I'll further think about the issue with compressed oops. b. break down large objects into smaller ones like J9's arraylets A few questions on this approach: We probably don't need to handle large non-array objects, right? They should be extremely rare. Is this approach compliant with JLS [1] and JVMS [2]? I read about them but couldn't find evidence of noncompliance. Supporting JNI GetCritical does look tricky. Another tricky issue is that we should preserve O(1) complexity for accesses by index. c. carving out the adjustable subset of regions for humongous allocs and doing power-of-two buddy-system allocation I have also thought about a quite similar idea by introducing a dynamic-sized humongous space. It might be better to support multiple dynamic-sized humongous spaces. I admit I probably have not thought this approach as deep as Aleksey has. [1] https://docs.oracle.com/javase/specs/jls/se13/html/jls-10.html [2] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.newarray -Man On Fri, Jan 17, 2020 at 2:01 AM Thomas Schatzl wrote: > Hi, > > On 17.01.20 01:53, Man Cao wrote: > > Hi all, > > > > While migrating our workload from CMS to G1, we found many production > > applications suffer from humongous allocations. > > The default threshold for humongous objects is often too small for our > > applications with heap sizes between 2GB-15GB. > > Humongous allocations caused noticeable increase in the frequency of > > concurrent old-gen collections, mixed collections and CPU usage. > > We could advise applications to increase G1HeapRegionSize. But some > > applications still suffer with G1HeapRegionSize=32M. > > We could also advise applications to refactor code to break down large > > objects. But it is a high cost effort that may not always be feasible. > > > > We'd like to work with the OpenJDK community together to improve G1's > > handling of humongous objects. > > Thomas Schatzl mentioned to me a few efforts/ideas on this front in an > > offline chat: > > a. Allocation into tail regions of humongous object: JDK-8172713, > > JDK-8031381 > > b. Commit additional virtual address space for humongous objects. > > c. Improve the region selection heuristics (e.g., first-fit, best-fit) > for > > humongous objects. > > > > I didn't find open CRs for b. and c. Could someone give pointers? > > Are there any other ideas/prototypes on this front? > > TLDR: we in the Oracle gc team have quite a few ideas that can decrease > the issue significantly. We are happy to help with implementation of any > of these. > We would appreciate a sample application. > > Long version: > > The problems with humongous object allocation in G1: > > - internal fragmentation: the tail end of a humongous object is wasted > space. > > - external fragmentation: sometimes you can't find enough contiguous > space for a humongous object. > > There are quite a few CRs related to this problem in the bug tracker; I > just now connected them together using a "g1-humongous" label [0]. > > Here's a rundown of our ideas, categorized a little (note that these CRs > predate significant changes due to how G1 works now, so the ideas may > need to be adapted to the current situation): > > - try to get rid of humongous asap, i.e. improve eager reclaim support > by allowing eager reclaim with reference arrays (JDK-8048180) or > non-objArrays (JDK-8073288). > I remember the main problem with that were stale remembered set entries > after removal (and SATB marking, but you could just not do eager reclaim > during marking). > In the applications we had at hand at that time, reference arrays tended > to be not eager reclaimable most of the time, and humongous regular > objects were rare. > So the benefit to look into this might be small. > > - allow allocation into the tail end of humongous objects (JDK-8172713); > there has once been an internal prototype for that, but it has been > abandoned because of implementation issues (it was a hack that has not > been completed to a stable state, mainly because humongous object > management had been full of odd quirks wrt to region management. This > has been fixed since. Also the example application benefitted more from > eager reclaim). > > While the argument from Aleksey about nepotism in the other thread is > valid (as far as I understand it), it depends on the implementation. The > area at the tail end could be considered as a separate evacuation > source, i.e. evacuated independently of the humongous object (and that > would actually improve the code to clean out HeapRegion ;)). > (This needs more care with single-region humongous objects but does not > seem completely problematic; single-region humongous objects may > nowadays not be a big issue to just move during GC). > > - external fragmentation can be approached in many ways: > > - or just ignored by letting G1 reserve a multiple of MaxHeapSize > while only ever committing MaxHeapSize (JDK-8229373). The main drawback > here is that it impacts the range of heaps where compressed oops can be > used, and 32 bit (particularly Windows) VMs (if you still care, but the > feature could be disabled as well). > Compressed oops typically improve throughput significantly. Of course, > as long as the total size of the reservation is below the threshold, it > does not really matter. > > Fwiw, when using the HeterogeneousHeapRegionManager, this is already > attempted (for other reasons). > > - improve the region allocator to decrease the problem (JDK-8229373). > The way G1 currently allocates regions is a first-fit approach which > interferes a bit with destination region selection for old and survivor > regions, likely creating more fragmentation than necessary. (Basically: > it does not care at all, so go figure ;) ). > Also during mixed gc one could explicitly prefer regions to evacuate > that break long runs of free regions, weighing those regions higher > (evacuating earlier). This needs to be done in conjunction with the > remembered set selection at end of marking, before creating them. > > Long time ago, on a different regional collector, I started looking into > this. > > - actively defragment the heap during GC. This may either be full gc > (JDK-8191565) like shenandoah does, or any young gc assuming that G1 > first kept remembered sets for potential candidates (JDK-8038487). > > - never create humongous objects > > - potentially implement one of the various ideas in the literature to > break down large objects into smaller ones, J9's arraylets being one of > them. > > There are other solutions like completely separate allocation of > humongous objects like ZGC does, but that typically has the same problem > as reserving more space (i.e. compressed oops range, but ZGC does not > care at this time). > > I think it would help potential contributors if there were some > application available where the impact of changes could be shown on in > some way. In the past, whenever there had been someone with that > problem, these persons were happy to just increase heap region size - > which is great for them, but does not fix the problem :) > > We would in any case help anyone taking a stab of one of these ideas (or > others). > > Thanks, > Thomas > > [0] > > https://bugs.openjdk.java.net/browse/JDK-8237466?jql=labels%20%3D%20g1-humongous > From thomas.schatzl at oracle.com Sat Jan 18 10:13:31 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sat, 18 Jan 2020 11:13:31 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: References: <4623ce42-7b6c-8b46-5915-8ff708b82f5c@oracle.com> Message-ID: Hi, On Fri, 2020-01-17 at 20:08 -0800, Man Cao wrote: > Thanks for the in-depth responses! > > For a sample application, I actually have a modified BigRamTester > that allocates humongous objects, and it can demonstrate some of the > problems. > Would JDK-8204689 be addressed soon? Then we can merge the variants Given previous track record on that, unfortunately not. > of BigRamTester. A possible concern is that the "humongous > BigRamTester" is not representative of the production workload's > problem with humongous objects. > The humongous objects in production workload are more likely short- > lived, whereas they are long-lived in "humongous BigRamTester". For short-lived humongous objects eager reclaim can do miracles. If your objects are non-objArrays, you could check for the reason why they are not eagerly reclaimed - maybe the threshold for the amount of remembered set entries to keep these humongous objects as eligible for eager reclaim is too low, and increasing that one would just make it work. Enabling gc+humongous=debug can give more information. Note that in JDK13 we (implicitly) increased this threshold, and in JDK14 we removed the main reason why the threshold is as low as it is (calculating the number of rememebered set entries). It is likely possible to increase this threshold by one or even two magnitudes now, potentially increasing its effectiveness significantly with a one-liner change. I will file a CR for that, thought of it but forgot when doing the jdk14 modification. > Perhaps we can modify it further to make it the humongous objects > short-lived. I will keep this topic on my radar and see if I can find > more realistic benchmarks. > > For OOMs due to fragmentation and ideas related to full GC (JDK- > 8191565, JDK-8038487), I'd like to point out that the near-OOM cases > are less of a concern for our production applications. Their heap > sizes are sufficiently large in order to keep GC overhead low with > CMS in the past. When they move to G1, they almost never trigger full > GCs even with a non-trivial number of humongous allocations. > The problem is the high frequency of concurrent cycles and mixed > collections as a result of humongous allocations. Fundamentally it is Which indicates that eager reclaim does not work in this application for some reason. > also due to fragmentation, but only addressing the near-OOM cases > would not solve the problem. Doing more active defragmentation could > indeed help. To me, spending the effort on combating internal fragmentation (allow allocation in tail ends) and external fragmentation by actively defragmenting seems to be at least worth comparing to other options. It could help with all problems but cases where you allocate a very large of humongous objects and you can't keep the humognous object tails filled. This option still keeps the invariant that humongous objects need to be allocated at a region boundary. Most of the other ideas you propose below also (seem to) retain this property. > It might be better to first fully explore the feasibilities of those > crazier ideas. If one of them works, then we don't need to > continuously improve G1 here and there. So far there are 3 of them. > They all can get rid of humongous regions completely if I understand > correctly. > a. let G1 reserve a multiple of MaxHeapSize while only ever > committing MaxHeapSize (JDK-8229373) > I like this approach most, especially since JDK-8211425 is > already implemented. I'll further think about the issue with > compressed oops. It is simplest, but does not solve the issue with internal fragmentation which is ultimately responsible for concurrent cycle frequency. Maybe it is sufficient as "most" applications only use single or low double-digit GB heaps at the moment where the entire reservation still fits into the 32gb barrier. If the heap is already larger than the compressed oops range, then this solution would certainly be simplest for the external fragmentation issue. If you are already way beyond that barrier, you might just use ZGC though for other reasons too if you are fine with any potential throughput hit. > b. break down large objects into smaller ones like J9's arraylets > A few questions on this approach: > We probably don't need to handle large non-array objects, right? > They should be extremely rare. Arraylets do not solve that problem either. > Is this approach compliant with JLS [1] and JVMS [2]? I read > about them but couldn't find evidence of noncompliance. I do not think there is an issue but I did not specifically read the specs again. Given that J9 is spec compliant afaik when they use arraylets (with the default balanced collector), so would Hotspot. > Supporting JNI GetCritical does look tricky. Another tricky issue You could double-map like https://blog.openj9.org/2019/05/01/double-map-arraylets/ does for native access. Btw the same text also indicates that copying seems like a non-starter anyway, as, quoting from the text "One use case, SPECjbb2015 benchmark is not being able to finish RT curve...". > is that we should preserve O(1) complexity for accesses by index. Not sure what prevents arraylets in particular from being O(1); a particular access is slower though due to the additional indirection with the spine. Using the double-mapped array for JITted code may have the same problem with compressed oops as other solutions; particularly if you do not know the size of the processed array in advance, you need to create extra code. Which means that there is significant optimization work needed to make array access "as fast" as before in jitted code. > c. carving out the adjustable subset of regions for humongous allocs > and doing power-of-two buddy-system allocation The buddy system (as I understand it, maybe Aleksey could share more details) still suffers from internal fragmentation, potentially even more than now. > I have also thought about a quite similar idea by introducing a > dynamic-sized humongous space. It might be better to support multiple > dynamic-sized humongous spaces. I admit I probably have not thought > this approach as deep as Aleksey has. This is the approach ZGC takes, which has the associated problems with compressed oops. I do not think we can completely give up the compressed oops use case at least until alternatives are explored. > > [1] https://docs.oracle.com/javase/specs/jls/se13/html/jls-10.html > [2] > https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.newarray > Thanks, Thomas From maoliang.ml at alibaba-inc.com Sun Jan 19 07:08:38 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Sun, 19 Jan 2020 15:08:38 +0800 Subject: =?UTF-8?B?RGlzY3Vzc2lvbjogaW1wcm92ZSBodW1vbmdvdXMgb2JqZWN0cyBoYW5kbGluZyBmb3IgRzE=?= Message-ID: <695ee6a6-a182-40a9-bfbf-49214d2fdaaa.maoliang.ml@alibaba-inc.com> Hi Guys, We Alibaba have experienced the same problem as Man introduced. Some applications got frequent concurrent mark cycles and high cpu usage and even some to-space exhausted failures because of large amount of humongous object allocation even with G1HeapRegionSize=32m. But those applications worked fine with ParNew/CMS. We are working on some enhancements for better reclamation of humongous objects. Our first intention is to reduce the frequent concurrent cycles and possible to-space exhausted so the heap utility or arraylets are not taken into consideration yet. Our solution is more like a ParNew/CMS flow and will treat a humongous object as young or old. 1. Humongous object allocation in mutator will be considered into eden size and won't directly trigger concurrent mark cycle. That will avoid the possible to-space exhausted while concurrent mark is working and humongous allocations are "eating" the free regions. 2. Enhance the reclamation of short-live humongous object by covering object array that current eager reclaim only supports primitive type for now. This part looks same to JDK-8048180 and JDK-8073288 Thomas mentioned. The evacuation flow will iterate the humongous object array as a regular object if the humongous object is "young" which can be distinguished by the "age" field in markoop. The patch is being tested. We will share it once it proves to work fine with our applications. I don't know if any similar approach has been already tried and any advices? Thanks, Liang From thomas.schatzl at oracle.com Mon Jan 20 10:46:06 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jan 2020 11:46:06 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: References: <4623ce42-7b6c-8b46-5915-8ff708b82f5c@oracle.com> Message-ID: <48d29ef4-800d-9402-3bfb-7dab1c895a54@oracle.com> Hi, On 18.01.20 11:13, Thomas Schatzl wrote: > Hi, > > On Fri, 2020-01-17 at 20:08 -0800, Man Cao wrote: >> Thanks for the in-depth responses! >> [...] > >> of BigRamTester. A possible concern is that the "humongous >> BigRamTester" is not representative of the production workload's >> problem with humongous objects. >> The humongous objects in production workload are more likely short- >> lived, whereas they are long-lived in "humongous BigRamTester". > > For short-lived humongous objects eager reclaim can do miracles. If > your objects are non-objArrays, you could check for the reason why they > are not eagerly reclaimed - maybe the threshold for the amount of > remembered set entries to keep these humongous objects as eligible for > eager reclaim is too low, and increasing that one would just make it > work. Enabling gc+humongous=debug can give more information. > > Note that in JDK13 we (implicitly) increased this threshold, and in > JDK14 we removed the main reason why the threshold is as low as it is > (calculating the number of rememebered set entries). > > It is likely possible to increase this threshold by one or even two > magnitudes now, potentially increasing its effectiveness significantly > with a one-liner change. I will file a CR for that, thought of it but > forgot when doing the jdk14 modification. JDK-8237500. >> >> For OOMs due to fragmentation and ideas related to full GC (JDK- >> 8191565, JDK-8038487), I'd like to point out that the near-OOM cases >> are less of a concern for our production applications. Their heap >> sizes are sufficiently large in order to keep GC overhead low with >> CMS in the past. When they move to G1, they almost never trigger full >> GCs even with a non-trivial number of humongous allocations. >> The problem is the high frequency of concurrent cycles and mixed >> collections as a result of humongous allocations. Fundamentally it is > > Which indicates that eager reclaim does not work in this application > for some reason. Note that it would be appreciated if we all were able to discuss issues on an actual log (gc+heap=debug,gc+humongous=debug; some rough comparison of gc's performed with g1 and CMS, with some distribution of g1 gc pauses) than trying to guess what each others actual problems are. >> also due to fragmentation, but only addressing the near-OOM cases >> would not solve the problem. Doing more active defragmentation could >> indeed help. > > To me, spending the effort on combating internal fragmentation (allow > allocation in tail ends) and external fragmentation by actively > defragmenting seems to be at least worth comparing to other options. > > It could help with all problems but cases where you allocate a very > large of humongous objects and you can't keep the humognous object > tails filled. This option still keeps the invariant that humongous > objects need to be allocated at a region boundary. > > Most of the other ideas you propose below also (seem to) retain this > property. After some more thought, all these solutions actually all seem to do so. Even the arraylets would suffer from the same internal fragmentation for the last arrayoid as it does now since they seem to stay humongous to avoid constant copying and remapping. There is some remark in some tech paper about arraylets (https://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html thatt indicates that the balanced collector seems to not move the arrayoids too. ([...] Additionally, the balanced collector never needs to move an arraylet leaf once it has been allocated. The cost of relocating an array is limited to the cost of relocating the spine, so large arrays do not contribute to higher defragmentation times. [...]). Thanks, Thomas From roy.sunny.zhang007 at gmail.com Mon Jan 20 10:54:37 2020 From: roy.sunny.zhang007 at gmail.com (Roy Zhang) Date: Mon, 20 Jan 2020 18:54:37 +0800 Subject: Abnormal high sys time in G1 GC In-Reply-To: References: Message-ID: Sent to hotspot-gc-dev mail list as well :) Thank you for ur help in advance!!! Thanks, Roy On Mon, Jan 20, 2020 at 6:22 PM Roy Zhang wrote: > Dear JVM experts, > > Recently we found GC spike (long STW minor GC), and sys time is high when > we GC time is high. Normally sys time is near 0 seconds and minor GC is > less than 500ms. > > From > http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-October/020630.html > and https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/, > high sys time could be caused by operation system problem/VM related > problem/memory constraint/disk IO pressure/Transparent Huge Pages. > > I checked them one by one, don't find any clue, could u please kindly > provide suggestion? Thanks in advance! > > 1.operation system problem > --We have enough CPU/memory/disk (48 cpu cores + 373 RAM with 160G heap, > disk is enough), and there is no error in /var/log/dmesg > 2. memory constraint > -- We have enough available memory. available memory (free -m) is 263G > 3. disk IO pressure > -- Not find issue from disk info from prometheus node exporter. > Granularity is 15s, and I can't find counterpart of avgqu-sz & util metrics > (disk IO util and saturation metrics) which is part of iostat. It could be > caused by big Granularity??? > 4. VM related problem > -- We are using physical machine > 5. Transparent Huge Pages. > It is madvise. It could be a problem, but we don't have this issue > previously. It has been running for nearly 20 weeks. > > *cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never* > > *JDK version:* > OpenJDK Runtime Environment, 1.8.0_222-b10 > > *Java Opts:* > -javaagent:/server/jmx_prometheus_javaagent-0.12.0.jar=xxxx:/server/config.yaml > > -server > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=xxxx > -Dcom.sun.management.jmxremote.rmi.port=xxxx > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Xloggc:/server/xxxx.log > -XX:+PrintGCDateStamps > -XX:AutoBoxCacheMax=1000000 > -XX:+UseG1GC > -XX:MaxGCPauseMillis=500 > -XX:+UnlockExperimentalVMOptions > -XX:G1NewSizePercent=50 > -XX:InitiatingHeapOccupancyPercent=70 > -XX:+ParallelRefProcEnabled > -XX:+ExplicitGCInvokesConcurrent > -XX:+UseStringDeduplication > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -Xms160g > -Xmx160g > -XX:+HeapDumpOnOutOfMemoryError > > *Snippet of GC log:* > > 2020-01-20T07:27:03.166+0000: 2756.665: [GC pause (G1 Evacuation Pause) > (young), *6.2899024 secs*] > [Parallel Time: 6255.0 ms, GC Workers: 33] > [GC Worker Start (ms): Min: 2756664.9, Avg: 2756665.5, Max: > 2756666.1, Diff: 1.2] > [Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 5.3, Diff: 5.3, > Sum: 16.8] > [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 25.6] > [Processed Buffers: Min: 0, Avg: 1.6, Max: 4, Diff: 4, Sum: 53] > [Scan RS (ms): Min: 142.0, Avg: 145.3, Max: 146.4, Diff: 4.4, Sum: > 4794.1] > [Code Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 3.5, Diff: 3.5, > Sum: 8.8] > * [Object Copy (ms): Min: 6100.1, Avg: 6101.8, Max: 6106.5, Diff: > 6.4, Sum: 201358.4]* > [Termination (ms): Min: 0.1, Avg: 5.2, Max: 6.7, Diff: 6.6, Sum: > 172.9] > [Termination Attempts: Min: 1, Avg: 1353.0, Max: 1476, Diff: > 1475, Sum: 44650] > [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: > 7.0] > [GC Worker Total (ms): Min: 6253.4, Avg: 6254.1, Max: 6254.7, Diff: > 1.2, Sum: 206383.7] > [GC Worker End (ms): Min: 2762919.4, Avg: 2762919.6, Max: 2762919.8, > Diff: 0.4] > [Code Root Fixup: 0.6 ms] > [Code Root Purge: 0.0 ms] > [String Dedup Fixup: 0.7 ms, GC Workers: 33] > [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] > [Table Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 2.0] > [Clear CT: 4.0 ms] > [Other: 29.6 ms] > [Choose CSet: 0.1 ms] > [Ref Proc: 10.3 ms] > [Ref Enq: 0.6 ms] > [Redirty Cards: 11.3 ms] > [Humongous Register: 0.2 ms] > [Humongous Reclaim: 0.0 ms] > [Free CSet: 6.5 ms] > [Eden: 72576.0M(72576.0M)->0.0B(80896.0M) Survivors: 9344.0M->1024.0M > Heap: 83520.0M(160.0G)->11046.9M(160.0G)] > * [Times: user=27.19 sys=162.28, real=6.30 secs] * > > 2020-01-20T06:59:23.382+0000: 1096.881: [GC pause (G1 Evacuation Pause) > (young) (initial-mark), *4.1248088 secs*] > [Parallel Time: 4098.0 ms, GC Workers: 33] > [GC Worker Start (ms): Min: 1096882.1, Avg: 1096882.8, Max: > 1096883.2, Diff: 1.2] > [Ext Root Scanning (ms): Min: 4.0, Avg: 4.8, Max: 6.1, Diff: 2.0, > Sum: 159.7] > [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 9.5] > [Processed Buffers: Min: 0, Avg: 1.3, Max: 6, Diff: 6, Sum: 43] > * [Scan RS (ms): Min: 2001.2, Avg: 2012.2, Max: 2013.4, Diff: 12.2, > Sum: 66401.0]* > [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 10.7, Diff: 10.7, > Sum: 18.5] > * [Object Copy (ms): Min: 2039.3, Avg: 2049.2, Max: 2079.5, Diff: > 40.2, Sum: 67623.1]* > [Termination (ms): Min: 0.0, Avg: 29.6, Max: 39.7, Diff: 39.7, Sum: > 978.0] > [Termination Attempts: Min: 1, Avg: 6587.0, Max: 8068, Diff: > 8067, Sum: 217372] > [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: > 7.9] > [GC Worker Total (ms): Min: 4096.3, Avg: 4096.9, Max: 4097.7, Diff: > 1.4, Sum: 135197.8] > [GC Worker End (ms): Min: 1100979.5, Avg: 1100979.7, Max: 1100979.9, > Diff: 0.4] > [Code Root Fixup: 0.6 ms] > [Code Root Purge: 0.2 ms] > [String Dedup Fixup: 1.0 ms, GC Workers: 33] > [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > [Table Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.7, Diff: 0.7, Sum: 1.4] > [Clear CT: 3.4 ms] > [Other: 21.7 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 9.1 ms] > [Ref Enq: 0.9 ms] > [Redirty Cards: 4.3 ms] > [Humongous Register: 0.2 ms] > [Humongous Reclaim: 0.0 ms] > [Free CSet: 5.3 ms] > [Eden: 81184.0M(81184.0M)->0.0B(72576.0M) Survivors: 736.0M->9344.0M > Heap: 83508.0M(160.0G)->10944.0M(160.0G)] > > * [Times: user=68.40 sys=9.11, real=4.13 secs] * > > Thanks, > Roy > From thomas.schatzl at oracle.com Mon Jan 20 11:11:18 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jan 2020 12:11:18 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: <695ee6a6-a182-40a9-bfbf-49214d2fdaaa.maoliang.ml@alibaba-inc.com> References: <695ee6a6-a182-40a9-bfbf-49214d2fdaaa.maoliang.ml@alibaba-inc.com> Message-ID: <50da5a35-3a7c-566e-67d1-0659f1e068c2@oracle.com> Hi Liang, On 19.01.20 08:08, Liang Mao wrote: > Hi Guys, > > We Alibaba have experienced the same problem as Man introduced. > Some applications got frequent concurrent mark cycles and high > cpu usage and even some to-space exhausted failures because of > large amount of humongous object allocation even with > G1HeapRegionSize=32m. But those applications worked fine > with ParNew/CMS. We are working on some enhancements for better Can you provide logs? (with gc+heap=debug,gc+humongous=debug) > reclamation of humongous objects. Our first intention is to reduce > the frequent concurrent cycles and possible to-space exhausted so > the heap utility or arraylets are not taken into consideration yet. > > Our solution is more like a ParNew/CMS flow and will treat a > humongous object as young or old. > 1. Humongous object allocation in mutator will be considered into > eden size and won't directly trigger concurrent mark cycle. That > will avoid the possible to-space exhausted while concurrent mark > is working and humongous allocations are "eating" the free regions. (I am trying to imagine situations here where this would be a problem since I do not have a log) That helps if G1 is already trying to do a marking cycle if the space is tight and already eating into the reserve that has explicitly been set aside for this case (G1ReservePercent - did you try increasing that for a workaround?). It does make young collections much more frequent than necessary otherwise. Particularly if these humongous regions are eager-reclaimable. In these cases the humongous allocations would be "free", while with that policy they would cause a young gc. The other issue, if these humongous allocations cause too many concurrent cycles could be managed by looking into canceling the concurrent marking if that concurrent start gc freed lots and lots of humongous objects, e.g. getting way below the mark threshold again. I did not think this through though, of course at some point you do need to start the concurrent mark. Some (or most) of that heap pressure might have been caused by the internal fragmentation, so allowing allocation into the tail ends would very likely decrease that pressure too. This would likely be the first thing I would be looking into if the logs indicate that. > 2. Enhance the reclamation of short-live humongous object by > covering object array that current eager reclaim only supports > primitive type for now. This part looks same to JDK-8048180 and > JDK-8073288 Thomas mentioned. The evacuation flow will iterate > the humongous object array as a regular object if the humongous > object is "young" which can be distinguished by the "age" field > in markoop. > > The patch is being tested. We will share it once it proves to > work fine with our applications. I don't know if any similar > approach has been already tried and any advices? The problem with treating humongous reference arrays as young is that this heuristic significantly increases the garbage collection time if that object survives the collection. I.e. the collector needs to iterate over all young objects, and while you do save the time to copy the object by in-place aging, scanning the references tends to take more time than copying. In that "different regional collector" I referenced in the other email exactly this had been implemented with the above issues. That collector also had configurable regions down to 64k (well, basically even less, but anything below that was just for experimentation, and 64k had been very debatable too), so the humongous object problem had been a lot larger. It might not be the case with G1's "giant" humongous objects. Treating them as old like they are now within G1 allows you to be a lot more selective about what you take in for garbage collection. Now the policy isn't particularly smart (just take humongous objects of a particular type with less than a low, fixed threshold of remembered set entries), but that could be improved. I.e. G1 has a measure of how long scanning a remembered set entry approximately takes, so that could be made dependent on available time. Thanks, Thomas From stefan.karlsson at oracle.com Mon Jan 20 11:28:03 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 20 Jan 2020 12:28:03 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> Message-ID: <10988744-5a87-5c70-eaae-cbad447dc2b3@oracle.com> Looks good. StefanK On 2020-01-17 14:27, Thomas Schatzl wrote: > Hi Stefan, > > On 17.01.20 10:06, Stefan Johansson wrote: >> Hi Thomas, >> >> On 2020-01-16 14:20, Thomas Schatzl wrote: >>> Hi all, >>> >>> ?? can I get reviews for this change that fixes a bug in the >>> abortable mixed gc algorithm where G1 might corrupt oops embedded in >>> nmethods due to parallel modification during an optional evacuation >>> phase? >>> >>> G1 currently collects embedded oops in nmethods twice: once in the >>> optional roots list, and once as nmethods in the strong code roots >>> list for a particular region. >>> >>> Now it can happen that this oop embedded in in the code stream is >>> unaligned, so if that oop is modified during relocation word tearing >>> may occur, causing follow-up crashes. >>> >>> The fix is to not collect oops from nmethods in the optional code >>> root list as the strong code root list for a particular region >>> already always contains it anyway. >>> >>> Thanks go to stefank, eriko and sjohanss for helping with analyzing, >>> testing and the discussion around it. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8235305 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ >> >> Fix looks good. > > Thanks for your review. > >> Just some things around the naming of the template parameter and enum >> after adding this. I don't have a much better idea > [...] > > Talked to them about this and I'm good with their suggestion: > > http://cr.openjdk.java.net/~tschatzl/8235305/webrev.1 (full) > http://cr.openjdk.java.net/~tschatzl/8235305/webrev.0_to_1 (diff) > > Thanks, > ? Thomas > From thomas.schatzl at oracle.com Mon Jan 20 11:57:03 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jan 2020 12:57:03 +0100 Subject: [14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation In-Reply-To: <10988744-5a87-5c70-eaae-cbad447dc2b3@oracle.com> References: <15678a97-f219-e0f4-c0b6-a4a2a06e6768@oracle.com> <1ecf88a0-68dd-276a-0a7a-f068c587168d@oracle.com> <78970e90-7b4c-1618-af6f-0b8e37af47f3@oracle.com> <10988744-5a87-5c70-eaae-cbad447dc2b3@oracle.com> Message-ID: <5337c7ba-5872-70c7-1082-abe149a02645@oracle.com> Hi Stefan + Stefan, thanks for your reviews :) Thomas On 20.01.20 12:28, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-01-17 14:27, Thomas Schatzl wrote: >> Hi Stefan, >> >> On 17.01.20 10:06, Stefan Johansson wrote: >>> Hi Thomas, >>> >>> On 2020-01-16 14:20, Thomas Schatzl wrote: >>>> Hi all, >>>> >>>> ?? can I get reviews for this change that fixes a bug in the >>>> abortable mixed gc algorithm where G1 might corrupt oops embedded in >>>> nmethods due to parallel modification during an optional evacuation >>>> phase? >>>> >>>> G1 currently collects embedded oops in nmethods twice: once in the >>>> optional roots list, and once as nmethods in the strong code roots >>>> list for a particular region. >>>> >>>> Now it can happen that this oop embedded in in the code stream is >>>> unaligned, so if that oop is modified during relocation word tearing >>>> may occur, causing follow-up crashes. >>>> >>>> The fix is to not collect oops from nmethods in the optional code >>>> root list as the strong code root list for a particular region >>>> already always contains it anyway. >>>> >>>> Thanks go to stefank, eriko and sjohanss for helping with analyzing, >>>> testing and the discussion around it. >>>> >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8235305 >>>> Webrev: >>>> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/ >>> >>> Fix looks good. >> >> Thanks for your review. >> >>> Just some things around the naming of the template parameter and enum >>> after adding this. I don't have a much better idea >> [...] >> >> Talked to them about this and I'm good with their suggestion: >> >> http://cr.openjdk.java.net/~tschatzl/8235305/webrev.1 (full) >> http://cr.openjdk.java.net/~tschatzl/8235305/webrev.0_to_1 (diff) >> >> Thanks, >> ?? Thomas >> From stefan.karlsson at oracle.com Mon Jan 20 13:03:14 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 20 Jan 2020 14:03:14 +0100 Subject: [14] RFR 8237396: JvmtiTagMap::weak_oops_do() should not trigger barriers In-Reply-To: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> References: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> Message-ID: <96981f56-0406-af11-d184-62c819d90cab@oracle.com> Hi Zhengyu, On 2020-01-17 16:28, Zhengyu Gu wrote: > Please review this small patch that avoids barriers in > JvmtiTagMap::weak_oops_do() method. > > The method is used by GC and GC expects to see raw oops. For the record, ZGC doesn't require to see raw oops here. The unnecessary load barriers will simply pre-clean the oops before the closures are applied. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237396 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237396/webrev.00/ Looks good. I've tested with ZGC as well. Thanks, StefanK > > Test: > ? hotspot_gc > ? vmTestbase_nsk_jvmti > ? (fastdebug and release) on x86_64 Linux > > ? Submit test in progress. > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 20 13:15:55 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 20 Jan 2020 08:15:55 -0500 Subject: [14] RFR 8237396: JvmtiTagMap::weak_oops_do() should not trigger barriers In-Reply-To: <96981f56-0406-af11-d184-62c819d90cab@oracle.com> References: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> <96981f56-0406-af11-d184-62c819d90cab@oracle.com> Message-ID: <27f41387-5219-23fa-5b52-bd0b1f689438@redhat.com> Thanks, Stefan. -Zhengyu On 1/20/20 8:03 AM, Stefan Karlsson wrote: > Hi Zhengyu, > > On 2020-01-17 16:28, Zhengyu Gu wrote: >> Please review this small patch that avoids barriers in >> JvmtiTagMap::weak_oops_do() method. >> >> The method is used by GC and GC expects to see raw oops. > > For the record, ZGC doesn't require to see raw oops here. The > unnecessary load barriers will simply pre-clean the oops before the > closures are applied. > >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237396 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237396/webrev.00/ > > Looks good. > > I've tested with ZGC as well. > > Thanks, > StefanK > >> >> Test: >> ?? hotspot_gc >> ?? vmTestbase_nsk_jvmti >> ?? (fastdebug and release) on x86_64 Linux >> >> ?? Submit test in progress. >> >> Thanks, >> >> -Zhengyu >> > From rkennke at redhat.com Mon Jan 20 13:43:42 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 20 Jan 2020 14:43:42 +0100 Subject: [14] RFR 8237396: JvmtiTagMap::weak_oops_do() should not trigger barriers In-Reply-To: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> References: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> Message-ID: <1a6a6fa8-164a-8bba-29fa-9a271ac0e9b3@redhat.com> Hi Zhengyu, The change looks good to me. I was worried about other GCs, and Stefan has confirmed that it's ok there. Thanks, Roman > Please review this small patch that avoids barriers in > JvmtiTagMap::weak_oops_do() method. > > The method is used by GC and GC expects to see raw oops. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237396 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237396/webrev.00/ > > Test: > ? hotspot_gc > ? vmTestbase_nsk_jvmti > ? (fastdebug and release) on x86_64 Linux > > ? Submit test in progress. > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 20 13:56:10 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 20 Jan 2020 08:56:10 -0500 Subject: [14] RFR 8237396: JvmtiTagMap::weak_oops_do() should not trigger barriers In-Reply-To: <1a6a6fa8-164a-8bba-29fa-9a271ac0e9b3@redhat.com> References: <0088d47f-9dc5-5275-7242-47d1b544cc33@redhat.com> <1a6a6fa8-164a-8bba-29fa-9a271ac0e9b3@redhat.com> Message-ID: Thanks, Roman. -Zhengyu On 1/20/20 8:43 AM, Roman Kennke wrote: > Hi Zhengyu, > > The change looks good to me. I was worried about other GCs, and Stefan > has confirmed that it's ok there. > > Thanks, > Roman > > >> Please review this small patch that avoids barriers in >> JvmtiTagMap::weak_oops_do() method. >> >> The method is used by GC and GC expects to see raw oops. >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237396 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237396/webrev.00/ >> >> Test: >> ? hotspot_gc >> ? vmTestbase_nsk_jvmti >> ? (fastdebug and release) on x86_64 Linux >> >> ? Submit test in progress. >> >> Thanks, >> >> -Zhengyu >> > From rkennke at redhat.com Mon Jan 20 15:20:13 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 20 Jan 2020 16:20:13 +0100 Subject: RFR: 8237543: Shenandoah: More asserts around code roots Message-ID: We are still observing occasional corrupted code roots in Traversal GC. The assert always happens in code roots, and always at init-traversal. There are two ways this seems likely to happen: either when new code is generated, or during the previous GC cycle. We should plant some verifications there to ensure we fail earlier. Bug: https://bugs.openjdk.java.net/browse/JDK-8237543 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8237543/webrev/ Testing: hotspot_gc_shenandoah (fastdebug+release) ok Can I please get a review? Thanks, Roman From zgu at redhat.com Mon Jan 20 15:36:06 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 20 Jan 2020 10:36:06 -0500 Subject: RFR: 8237543: Shenandoah: More asserts around code roots In-Reply-To: References: Message-ID: Okay, looks good to me. -Zhengyu On 1/20/20 10:20 AM, Roman Kennke wrote: > We are still observing occasional corrupted code roots in Traversal GC. > The assert always happens in code roots, and always at init-traversal. > There are two ways this seems likely to happen: either when new code is > generated, or during the previous GC cycle. We should plant some > verifications there to ensure we fail earlier. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8237543 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8237543/webrev/ > > Testing: hotspot_gc_shenandoah (fastdebug+release) ok > > Can I please get a review? > > Thanks, > Roman > From maoliang.ml at alibaba-inc.com Tue Jan 21 06:25:51 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 21 Jan 2020 14:25:51 +0800 Subject: =?UTF-8?B?RGlzY3Vzc2lvbjogaW1wcm92ZSBodW1vbmdvdXMgb2JqZWN0cyBoYW5kbGluZyBmb3IgRzE=?= Message-ID: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> Hi Thomas, In fact we saw this issue with 8u. One issue I forgot to tell is that when CPU usage is quite high which is nearly 100% the concurrent mark will get very slow so the to-space exhuasted happened. BTW, is there any improvements for this point in JDK11 or higher versions? I didn't notice so far. Increasing reserve percent could alleviate the problem but seems not a completed solution. Cancelling concurrent mark cycle in initial-mark pause seems a delicate optimization which can cover some issues if a lot of humongous regions have been reclaimed in this pause. It can avoid the unnecessary cm cycle and also trigger cm earlier if neened. We will take this into the consideration. Thanks for the great idea:) If there is a short-live humongous object array which also references other short-live objects the situation could be worse. If we increase the G1HeapRegionSize, some humongous objects become normal objects and the behavior is more like CMS then everything goes fine. I don't think we have to not allow humongous objects to behave as normal ones. A new allocated humongous object array can probably reference objects in young generation and scanning the object array by remset couldn't be better than directly iterating the array in evacuation because of possible prefetch. We can have an alternative max survivor age for humongous object, maybe 5 or 8 at most otherwise let eager reclam do it. A tradeoff can be made to balance the pause time and reclamation possibility of short-live objects. So the enhanced solution can be 1. Cancelling concurrent mark if not necessary. 2. Increase the reclamation possibility of short-live humongous objects. An important reason for this issue is that Java developers easily challenge CMS can handle the application without significant CPU usage increase(caused by concurrent mark) but why G1 cannot. Personally I believe G1 can do anything not worse than CMS:) This proposal aims for the throughput gap comparing to CMS. If works with the barrier optimization which is proposed by Man and Google, imho the gap could be obviously reduced. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 20 (Mon.) 19:11 To:"MAO, Liang" ; Man Cao ; hotspot-gc-dev Subject:Re: Discussion: improve humongous objects handling for G1 Hi Liang, On 19.01.20 08:08, Liang Mao wrote: > Hi Guys, > > We Alibaba have experienced the same problem as Man introduced. > Some applications got frequent concurrent mark cycles and high > cpu usage and even some to-space exhausted failures because of > large amount of humongous object allocation even with > G1HeapRegionSize=32m. But those applications worked fine > with ParNew/CMS. We are working on some enhancements for better Can you provide logs? (with gc+heap=debug,gc+humongous=debug) > reclamation of humongous objects. Our first intention is to reduce > the frequent concurrent cycles and possible to-space exhausted so > the heap utility or arraylets are not taken into consideration yet. > > Our solution is more like a ParNew/CMS flow and will treat a > humongous object as young or old. > 1. Humongous object allocation in mutator will be considered into > eden size and won't directly trigger concurrent mark cycle. That > will avoid the possible to-space exhausted while concurrent mark > is working and humongous allocations are "eating" the free regions. (I am trying to imagine situations here where this would be a problem since I do not have a log) That helps if G1 is already trying to do a marking cycle if the space is tight and already eating into the reserve that has explicitly been set aside for this case (G1ReservePercent - did you try increasing that for a workaround?). It does make young collections much more frequent than necessary otherwise. Particularly if these humongous regions are eager-reclaimable. In these cases the humongous allocations would be "free", while with that policy they would cause a young gc. The other issue, if these humongous allocations cause too many concurrent cycles could be managed by looking into canceling the concurrent marking if that concurrent start gc freed lots and lots of humongous objects, e.g. getting way below the mark threshold again. I did not think this through though, of course at some point you do need to start the concurrent mark. Some (or most) of that heap pressure might have been caused by the internal fragmentation, so allowing allocation into the tail ends would very likely decrease that pressure too. This would likely be the first thing I would be looking into if the logs indicate that. > 2. Enhance the reclamation of short-live humongous object by > covering object array that current eager reclaim only supports > primitive type for now. This part looks same to JDK-8048180 and > JDK-8073288 Thomas mentioned. The evacuation flow will iterate > the humongous object array as a regular object if the humongous > object is "young" which can be distinguished by the "age" field > in markoop. > > The patch is being tested. We will share it once it proves to > work fine with our applications. I don't know if any similar > approach has been already tried and any advices? The problem with treating humongous reference arrays as young is that this heuristic significantly increases the garbage collection time if that object survives the collection. I.e. the collector needs to iterate over all young objects, and while you do save the time to copy the object by in-place aging, scanning the references tends to take more time than copying. In that "different regional collector" I referenced in the other email exactly this had been implemented with the above issues. That collector also had configurable regions down to 64k (well, basically even less, but anything below that was just for experimentation, and 64k had been very debatable too), so the humongous object problem had been a lot larger. It might not be the case with G1's "giant" humongous objects. Treating them as old like they are now within G1 allows you to be a lot more selective about what you take in for garbage collection. Now the policy isn't particularly smart (just take humongous objects of a particular type with less than a low, fixed threshold of remembered set entries), but that could be improved. I.e. G1 has a measure of how long scanning a remembered set entry approximately takes, so that could be made dependent on available time. Thanks, Thomas From kim.barrett at oracle.com Tue Jan 21 08:31:20 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 21 Jan 2020 03:31:20 -0500 Subject: RFR: 8233822: VM_G1CollectForAllocation should always check for upgrade to full Message-ID: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> Please review this G1 change to always check whether a full collection should be performed after a non-full collection pause, e.g. the collection needs to be "upgraded" to a full collection. There are various conditions which can lead to needing to do that, and as the CR suggests, we need to be consistent about checking for and performing such an upgrade. This is accomplished by moving most of do_collection_pause_at_safepoint into a helper function and changing that existing function to call the helper, then check for and, if needed, perform a needed upgrade to a full collection. Callers of that function are updated to remove explict conditional upgrading, where present. This also addresses the surprisingly placed call in a G1-specific block of code in gc/shared (see also JDK-8237567). CR: https://bugs.openjdk.java.net/browse/JDK-8233822 Webrev: https://cr.openjdk.java.net/~kbarrett/8233822/open.00/ Testing: mach5 tier1-5 Locally (linux-x64) ran modified InfiniteList.java test (allocate small rather than arrays) and verified some upgrades occurred as expected. From per.liden at oracle.com Tue Jan 21 10:18:26 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 21 Jan 2020 11:18:26 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level Message-ID: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> When using -Xlog:gc*, I now and then find that I miss basic relocation information, since it's currently printed at the debug level on the relocation set selector. I think we should leave the current logging as is, since that's still useful when debugging the relocation set selector itself. However, I think we should propagate some of the high level information and print it on the info level. Here's an example of what the output looks like with this patch: [...] [68.926s][info][gc,reloc ] GC(6) Small Pages: 529 / 1058M(93%), Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) [68.926s][info][gc,reloc ] GC(6) Medium Pages: 2 / 64M(6%), Empty: 0M(0%), Compacting: 64M(6%)->32M(3%) [68.926s][info][gc,reloc ] GC(6) Large Pages: 2 / 12M(1%), Empty: 6M(1%), Compacting: 0M(0%)->0M(0%) [68.926s][info][gc,reloc ] GC(6) Relocation: Successful [...] Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 /Per From thomas.schatzl at oracle.com Tue Jan 21 10:19:52 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 21 Jan 2020 11:19:52 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> Message-ID: Hi, On 21.01.20 07:25, Liang Mao wrote: > Hi Thomas, > > In fact we saw this issue with 8u. One issue I forgot to tell is that when > CPU usage is quite high which is nearly 100% the concurrent mark will > get very slow so the to-space exhuasted happened. BTW, is there any > improvements for this point in JDK11 or higher versions? I didn't notice so far. JDK13 has some implicit increases in the thresholds to take more humongous candidate regions. Not a lot though. > Increasing reserve percent could alleviate the problem but seems not a completed > solution. It would be nicer if g1 automatically adjusted this reserve based on actual allocation of course. ;) Which is another option btw - there are many ways to avoid the evacuation failure situation. > Cancelling concurrent mark cycle in initial-mark pause seems a delicate > optimization which can cover some issues if a lot of humongous regions have been > reclaimed in this pause. It can avoid the unnecessary cm cycle and also trigger cm > earlier if neened. > We will take this into the consideration. Thanks for the great idea:) > > If there is a short-live humongous object array which also references other > short-live objects the situation could be worse. If we increase the > G1HeapRegionSize, some humongous objects become normal objects and the behavior > is more like CMS then everything goes fine. I don't think we have to not allow humongous > objects to behave as normal ones. A new allocated humongous object array can probably > reference objects in young generation and scanning the object array by remset > couldn't be better than directly iterating the array in evacuation because of possible > prefetch. We can have an alternative max survivor age for humongous object, maybe 5 or 8 If I read this paragraph correctly you argue that keeping a large humongous objArray in young is okay because a) if you increase the heap region size, it has a high chance that it would be below the thresholds anyway, so you would scan it anyway b) scanning a humongous objArray with a few references is not much different performance wise than targeted scanning of the corresponding cards in the remembered set because of hardware. Regarding a) Since I have yet to see logs, I can't tell what the typical size of these arrays are (and I have not seen a "typical" humongous object distribution graph for these applications). However regions sizes are kind of proportional with heap size which kind of corresponds to the hardware that you need to use. I.e. you likely won't see G1 using 100 threads on 200m heap with 32m regions with current ergonomics. Even then this limits objArrays to 16M (at 32m region size), which limits the time spent scanning the object (and if ergonomics select 32m regions, the heap and the machine are probably quite big anyway). From what you and Man were telling, you seem to have a significant amount of humongous objects of unknown type that are much(?) larger than that. Regarding b) that has been wrong years ago when I did experiments on that (even the "limit age on humongous obj arrays" workaround - you can easily go as low as a max tenuring threshold of 1 to catch almost all of the relevant ones), and very likely still is. Let me do some over-the-thumb calculations: Assuming that we have 32M objects (random number, i.e. ~8m references), with, say 1k references (which is more than a handful), the remembered set would make you scan only 1.5% max (1000*512 bytes/card) of the object. I seriously doubt that prefetching or some magic hardware will make that amount additional work disappear. From a performance POV, with 20 GB/s bandwidth available, (which I am not sure you will reach during GC for whatever reasons; random number), you are spending 1.5ms (if I calculated correctly) cpu time just for finding out that the 32M object is completely full of null-s in the worst case. That's also the minimum amount of time you need per such object. Keeping it outside of young gen, and particularly if it has been allocated just recently it won't have a lot remembered set entries, would likely be much cheaper than that (as mentioned, G1 has a good measure of how long scanning a card will take so we could take this number). Only if G1 is going to scan it almost completely anyway (which we agree on is unlikely to be the case as it has "just" been allocated), then keeping it outside is disadvantagous. Note that its allocation could still be counted against the eden allowance in some situations. This could be seen as a way to slow down the mutator while it is busy trying to complete the marking. I am however not sure if it helps a lot assuming that changes to perform eager reclaim on objArrays won't work during marking btw. There would be need for a different kind of enforcing such an allocation penalty. Without more thinking and measurements I would not know when and how to account that, and what has to happen with existing mechanisms to absorb allocation spikes (i.e. G1ReservePercent). I just assume that you probably do not want both. Also something to consider. > at most otherwise let eager reclam do it. A tradeoff can be made to balance the > pause time and reclamation possibility of short-live objects. > > So the enhanced solution can be > 1. Cancelling concurrent mark if not necessary. > 2. Increase the reclamation possibility of short-live humongous objects. These are valid possibilities to improve the overall situation without fixing actual fragmentation issues ;) > An important reason for this issue is that Java developers easily > challenge CMS can handle the application without significant CPU usage increase > (caused by concurrent mark) > but why G1 cannot. Personally I believe G1 can do anything not worse > than CMS:) > This proposal aims for the throughput gap comparing to CMS. If works > with the barrier optimization which is proposed by Man and Google, imho the gap could be > obviously reduced. Thanks, Thomas From shade at redhat.com Tue Jan 21 10:21:01 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 21 Jan 2020 11:21:01 +0100 Subject: RFR (S) 8237570: Shenandoah: cleanup uses of allocation/free threshold in static heuristics Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8237570 Fix: https://cr.openjdk.java.net/~shade/8237570/webrev.01/ As noted by Justinas in the separate thread, ShAllocThresh has no effect on "static" heuristics, so it should not be adjusted. Also, it should use ShMinFreeThresh, as other heuristics use. This makes ShFreeThresh unused, and it is removed for clarity. Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Tue Jan 21 10:43:35 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 21 Jan 2020 11:43:35 +0100 Subject: RFR (S) 8237570: Shenandoah: cleanup uses of allocation/free threshold in static heuristics In-Reply-To: References: Message-ID: <51cf1b77-e89a-653c-abd3-152813ea936d@redhat.com> Makes sense. Good! Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8237570 > > Fix: > https://cr.openjdk.java.net/~shade/8237570/webrev.01/ > > As noted by Justinas in the separate thread, ShAllocThresh has no effect on "static" heuristics, so > it should not be adjusted. Also, it should use ShMinFreeThresh, as other heuristics use. This makes > ShFreeThresh unused, and it is removed for clarity. > > Testing: hotspot_gc_shenandoah > From per.liden at oracle.com Tue Jan 21 12:56:07 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 21 Jan 2020 13:56:07 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level In-Reply-To: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> References: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> Message-ID: <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> I got some off-line comments from Stefan, updated webrev: Diff: http://cr.openjdk.java.net/~pliden/8234440/webrev.1-diff Full: http://cr.openjdk.java.net/~pliden/8234440/webrev.1 /Per On 1/21/20 11:18 AM, Per Liden wrote: > When using -Xlog:gc*, I now and then find that I miss basic relocation > information, since it's currently printed at the debug level on the > relocation set selector. I think we should leave the current logging as > is, since that's still useful when debugging the relocation set selector > itself. However, I think we should propagate some of the high level > information and print it on the info level. > > Here's an example of what the output looks like with this patch: > > [...] > [68.926s][info][gc,reloc??? ] GC(6) Small Pages: 529 / 1058M(93%), > Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) > [68.926s][info][gc,reloc??? ] GC(6) Medium Pages: 2 / 64M(6%), Empty: > 0M(0%), Compacting: 64M(6%)->32M(3%) > [68.926s][info][gc,reloc??? ] GC(6) Large Pages: 2 / 12M(1%), Empty: > 6M(1%), Compacting: 0M(0%)->0M(0%) > [68.926s][info][gc,reloc??? ] GC(6) Relocation: Successful > [...] > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 > Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 > > /Per From stefan.karlsson at oracle.com Tue Jan 21 13:37:19 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 21 Jan 2020 14:37:19 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level In-Reply-To: <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> References: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> Message-ID: <11776dca-f130-46db-505d-8604d69a0cff@oracle.com> Looks good. StefanK On 2020-01-21 13:56, Per Liden wrote: > I got some off-line comments from Stefan, updated webrev: > > Diff: http://cr.openjdk.java.net/~pliden/8234440/webrev.1-diff > Full: http://cr.openjdk.java.net/~pliden/8234440/webrev.1 > > /Per > > On 1/21/20 11:18 AM, Per Liden wrote: >> When using -Xlog:gc*, I now and then find that I miss basic relocation >> information, since it's currently printed at the debug level on the >> relocation set selector. I think we should leave the current logging >> as is, since that's still useful when debugging the relocation set >> selector itself. However, I think we should propagate some of the high >> level information and print it on the info level. >> >> Here's an example of what the output looks like with this patch: >> >> [...] >> [68.926s][info][gc,reloc??? ] GC(6) Small Pages: 529 / 1058M(93%), >> Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) >> [68.926s][info][gc,reloc??? ] GC(6) Medium Pages: 2 / 64M(6%), Empty: >> 0M(0%), Compacting: 64M(6%)->32M(3%) >> [68.926s][info][gc,reloc??? ] GC(6) Large Pages: 2 / 12M(1%), Empty: >> 6M(1%), Compacting: 0M(0%)->0M(0%) >> [68.926s][info][gc,reloc??? ] GC(6) Relocation: Successful >> [...] >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 >> Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 >> >> /Per From per.liden at oracle.com Tue Jan 21 13:58:08 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 21 Jan 2020 14:58:08 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level In-Reply-To: <11776dca-f130-46db-505d-8604d69a0cff@oracle.com> References: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> <11776dca-f130-46db-505d-8604d69a0cff@oracle.com> Message-ID: <5e36116a-377e-7d8c-391a-fc59f66bba6f@oracle.com> Thanks Stefan! /Per On 1/21/20 2:37 PM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-01-21 13:56, Per Liden wrote: >> I got some off-line comments from Stefan, updated webrev: >> >> Diff: http://cr.openjdk.java.net/~pliden/8234440/webrev.1-diff >> Full: http://cr.openjdk.java.net/~pliden/8234440/webrev.1 >> >> /Per >> >> On 1/21/20 11:18 AM, Per Liden wrote: >>> When using -Xlog:gc*, I now and then find that I miss basic >>> relocation information, since it's currently printed at the debug >>> level on the relocation set selector. I think we should leave the >>> current logging as is, since that's still useful when debugging the >>> relocation set selector itself. However, I think we should propagate >>> some of the high level information and print it on the info level. >>> >>> Here's an example of what the output looks like with this patch: >>> >>> [...] >>> [68.926s][info][gc,reloc??? ] GC(6) Small Pages: 529 / 1058M(93%), >>> Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) >>> [68.926s][info][gc,reloc??? ] GC(6) Medium Pages: 2 / 64M(6%), Empty: >>> 0M(0%), Compacting: 64M(6%)->32M(3%) >>> [68.926s][info][gc,reloc??? ] GC(6) Large Pages: 2 / 12M(1%), Empty: >>> 6M(1%), Compacting: 0M(0%)->0M(0%) >>> [68.926s][info][gc,reloc??? ] GC(6) Relocation: Successful >>> [...] >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 >>> Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 >>> >>> /Per From maoliang.ml at alibaba-inc.com Tue Jan 21 14:26:42 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 21 Jan 2020 22:26:42 +0800 Subject: =?UTF-8?B?UmU6IERpc2N1c3Npb246IGltcHJvdmUgaHVtb25nb3VzIG9iamVjdHMgaGFuZGxpbmcgZm9y?= =?UTF-8?B?IEcx?= In-Reply-To: References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com>, Message-ID: <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> Hi Thomas, Thank you for pointing out my mistake for comparing iterating object array with card scanning that I missed the detail that card scanning doesn't need to scan the whole object array. I didn't provide gc log because I haven't sufficient statistics data about humongous distribution or the object arrays. The solution is just straightforward because increasing G1HeapRegionSize fixes the problem so I want to do the same to G1HeapRegionSize=32m. In my earlier memory of tunning some typical applications, humongous objects occupy more than half of the used heap after young GC with default G1HeapRegionSize. I guess perhaps half of our applications may encounter the issue with default setting. So currently we use the G1HeapRegionSize as approximately 1/500 of Xmx. I know that iterating humongous object array in young GC might significantly degrade the pause time orientied philosophy. But if the pause time is already in expectation with CMS such behavior isn't doing anything worse but avoid the GC turbulence by concurrent mark. Beside the obvious penalty to pause time, do you have any other concerns? > Note that its allocation could still be counted against the eden > allowance in some situations. This could be seen as a way to slow down > the mutator while it is busy trying to complete the marking. > I am however not sure if it helps a lot assuming that changes to perform > eager reclaim on objArrays won't work during marking btw. There would be > need for a different kind of enforcing such an allocation penalty. I'm sorry I didn't get these 2 paragraphs. Could you please explain more? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 21 (Tue.) 18:20 To:"MAO, Liang" ; Man Cao ; hotspot-gc-dev Subject:Re: Discussion: improve humongous objects handling for G1 Hi, On 21.01.20 07:25, Liang Mao wrote: > Hi Thomas, > > In fact we saw this issue with 8u. One issue I forgot to tell is that when > CPU usage is quite high which is nearly 100% the concurrent mark will > get very slow so the to-space exhuasted happened. BTW, is there any > improvements for this point in JDK11 or higher versions? I didn't notice so far. JDK13 has some implicit increases in the thresholds to take more humongous candidate regions. Not a lot though. > Increasing reserve percent could alleviate the problem but seems not a completed > solution. It would be nicer if g1 automatically adjusted this reserve based on actual allocation of course. ;) Which is another option btw - there are many ways to avoid the evacuation failure situation. > Cancelling concurrent mark cycle in initial-mark pause seems a delicate > optimization which can cover some issues if a lot of humongous regions have been > reclaimed in this pause. It can avoid the unnecessary cm cycle and also trigger cm > earlier if neened. > We will take this into the consideration. Thanks for the great idea:) > > If there is a short-live humongous object array which also references other > short-live objects the situation could be worse. If we increase the > G1HeapRegionSize, some humongous objects become normal objects and the behavior > is more like CMS then everything goes fine. I don't think we have to not allow humongous > objects to behave as normal ones. A new allocated humongous object array can probably > reference objects in young generation and scanning the object array by remset > couldn't be better than directly iterating the array in evacuation because of possible > prefetch. We can have an alternative max survivor age for humongous object, maybe 5 or 8 If I read this paragraph correctly you argue that keeping a large humongous objArray in young is okay because a) if you increase the heap region size, it has a high chance that it would be below the thresholds anyway, so you would scan it anyway b) scanning a humongous objArray with a few references is not much different performance wise than targeted scanning of the corresponding cards in the remembered set because of hardware. Regarding a) Since I have yet to see logs, I can't tell what the typical size of these arrays are (and I have not seen a "typical" humongous object distribution graph for these applications). However regions sizes are kind of proportional with heap size which kind of corresponds to the hardware that you need to use. I.e. you likely won't see G1 using 100 threads on 200m heap with 32m regions with current ergonomics. Even then this limits objArrays to 16M (at 32m region size), which limits the time spent scanning the object (and if ergonomics select 32m regions, the heap and the machine are probably quite big anyway). From what you and Man were telling, you seem to have a significant amount of humongous objects of unknown type that are much(?) larger than that. Regarding b) that has been wrong years ago when I did experiments on that (even the "limit age on humongous obj arrays" workaround - you can easily go as low as a max tenuring threshold of 1 to catch almost all of the relevant ones), and very likely still is. Let me do some over-the-thumb calculations: Assuming that we have 32M objects (random number, i.e. ~8m references), with, say 1k references (which is more than a handful), the remembered set would make you scan only 1.5% max (1000*512 bytes/card) of the object. I seriously doubt that prefetching or some magic hardware will make that amount additional work disappear. From a performance POV, with 20 GB/s bandwidth available, (which I am not sure you will reach during GC for whatever reasons; random number), you are spending 1.5ms (if I calculated correctly) cpu time just for finding out that the 32M object is completely full of null-s in the worst case. That's also the minimum amount of time you need per such object. Keeping it outside of young gen, and particularly if it has been allocated just recently it won't have a lot remembered set entries, would likely be much cheaper than that (as mentioned, G1 has a good measure of how long scanning a card will take so we could take this number). Only if G1 is going to scan it almost completely anyway (which we agree on is unlikely to be the case as it has "just" been allocated), then keeping it outside is disadvantagous. Note that its allocation could still be counted against the eden allowance in some situations. This could be seen as a way to slow down the mutator while it is busy trying to complete the marking. I am however not sure if it helps a lot assuming that changes to perform eager reclaim on objArrays won't work during marking btw. There would be need for a different kind of enforcing such an allocation penalty. Without more thinking and measurements I would not know when and how to account that, and what has to happen with existing mechanisms to absorb allocation spikes (i.e. G1ReservePercent). I just assume that you probably do not want both. Also something to consider. > at most otherwise let eager reclam do it. A tradeoff can be made to balance the > pause time and reclamation possibility of short-live objects. > > So the enhanced solution can be > 1. Cancelling concurrent mark if not necessary. > 2. Increase the reclamation possibility of short-live humongous objects. These are valid possibilities to improve the overall situation without fixing actual fragmentation issues ;) > An important reason for this issue is that Java developers easily > challenge CMS can handle the application without significant CPU usage increase > (caused by concurrent mark) > but why G1 cannot. Personally I believe G1 can do anything not worse > than CMS:) > This proposal aims for the throughput gap comparing to CMS. If works > with the barrier optimization which is proposed by Man and Google, imho the gap could be > obviously reduced. Thanks, Thomas From shade at redhat.com Tue Jan 21 18:29:36 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 21 Jan 2020 19:29:36 +0100 Subject: RFR (S) 8237586: Shenandoah: provide option to disable periodic GC Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8237586 Webrev: https://cr.openjdk.java.net/~shade/8237586/webrev.01/ The VM option is unsigned, which leaves as with "0" as special value. It also matches the behavior of G1PeriodicGCInterval and GuaranteedSafepointInterval. Testing: hotspot_gc_shenandoah (includes new testcases) -- Thanks, -Aleksey From zgu at redhat.com Tue Jan 21 19:18:09 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 21 Jan 2020 14:18:09 -0500 Subject: RFR (S) 8237586: Shenandoah: provide option to disable periodic GC In-Reply-To: References: Message-ID: Fix looks good to me. Please update copyright years. Thanks, -Zhengyu On 1/21/20 1:29 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8237586 > > Webrev: > https://cr.openjdk.java.net/~shade/8237586/webrev.01/ > > The VM option is unsigned, which leaves as with "0" as special value. It also matches the behavior > of G1PeriodicGCInterval and GuaranteedSafepointInterval. > > Testing: hotspot_gc_shenandoah (includes new testcases) > From manc at google.com Wed Jan 22 05:05:34 2020 From: manc at google.com (Man Cao) Date: Tue, 21 Jan 2020 21:05:34 -0800 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> Message-ID: Hi all, Thanks for the great discussion from Thomas and Liang! Regarding to GC logs, histogram of humongous allocations, and a more concrete example, I guess we are in the same boat here. We only advised users to increase G1HeapRegionSize, which would work around many cases of the problem. We have not yet closely studied patterns of the problematic humongous allocations. I will do such a study and follow up with some statistics and GC logs when I get my hands on them. >> maybe the threshold for the amount of >> remembered set entries to keep these humongous objects as eligible for >> eager reclaim is too low, and increasing that one would just make it work. > JDK-8237500 Thanks for this. I will definitely try tuning this if the humongous objects are non-objArrays. > You could double-map like > https://blog.openj9.org/2019/05/01/double-map-arraylets/ does for > native access. > There is some remark in some tech paper about arraylets > ( https://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html > that indicates that the balanced collector seems to not move the > arrayoids too. Thanks for digging into the details of arraylets. I didn't do much research on it. > Btw the same text also indicates that copying seems like a non-starter > anyway, as, quoting from the text "One use case, SPECjbb2015 benchmark > is not being able to finish RT curve...". > Not sure what prevents arraylets in particular from being O(1); a > particular access is slower though due to the additional indirection > with the spine. > ... > Which means that there is significant optimization work needed to make > array access "as fast" as before in jitted code These two issues: (1) copying for JNI Critical (2) slowing down typical jitted code for array accesses do sound like performance deal-breakers, particularly if they are only required for G1+arraylets but not other collectors. There are some use cases of JNI Critical on arrays that are solely for performance reasons, and we'd rather not slow them down. > It could help with all problems but cases where you allocate a very > large of humongous objects and you can't keep the humognous object > tails filled. This option still keeps the invariant that humongous > objects need to be allocated at a region boundary. > > Most of the other ideas you propose below also (seem to) retain this > property. Agreed. It seems that JDK-8172713 would help most ideas anyway. > Maybe it is sufficient as "most" applications only use single or low > double-digit GB heaps at the moment where the entire reservation still > fits into the 32gb barrier. I also had the same thought. Most of our important workloads have heap sizes less than 20GB. If the "reserve multiple MaxHeapSize" approach could work with compressed oops for <16GB heap, then it is quite acceptable. That said, now I do agree that I should first study the patterns of humongous allocations and look into improvement on eager reclamation. For the approach from Liang/Alibaba, I'm optimistic that it could solve many problems migrating from ParNew+CMS to G1. Because it handles humongous allocations in a similar way as ParNew+CMS does, plus G1 has the advantage of not copying humongous objects, the pause duration/frequency would probably not degrade compared to ParNew/CMS. I also agree with Thomas that it may increase pause duration compared to current G1 due to extra scanning, and allocation spikes might affect other aspects of G1. I noticed in the description for JDK-8027959: "a) logically keep LOBs in young gen, doing in-place aging", which sounds like the GC team have explored this approach for eager reclamation before? It might be the best of both worlds if we could make eager reclamation of humongous objArrays work without putting them in young gen, and further improve eager reclamation in general. -Man From erik.osterlund at oracle.com Wed Jan 22 09:04:33 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Wed, 22 Jan 2020 10:04:33 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level In-Reply-To: <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> References: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> Message-ID: <5f1ca9a1-0130-cdb1-30e6-4bc88aa3a684@oracle.com> Hi Per, Looks good. /Erik On 1/21/20 1:56 PM, Per Liden wrote: > I got some off-line comments from Stefan, updated webrev: > > Diff: http://cr.openjdk.java.net/~pliden/8234440/webrev.1-diff > Full: http://cr.openjdk.java.net/~pliden/8234440/webrev.1 > > /Per > > On 1/21/20 11:18 AM, Per Liden wrote: >> When using -Xlog:gc*, I now and then find that I miss basic >> relocation information, since it's currently printed at the debug >> level on the relocation set selector. I think we should leave the >> current logging as is, since that's still useful when debugging the >> relocation set selector itself. However, I think we should propagate >> some of the high level information and print it on the info level. >> >> Here's an example of what the output looks like with this patch: >> >> [...] >> [68.926s][info][gc,reloc??? ] GC(6) Small Pages: 529 / 1058M(93%), >> Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) >> [68.926s][info][gc,reloc??? ] GC(6) Medium Pages: 2 / 64M(6%), Empty: >> 0M(0%), Compacting: 64M(6%)->32M(3%) >> [68.926s][info][gc,reloc??? ] GC(6) Large Pages: 2 / 12M(1%), Empty: >> 6M(1%), Compacting: 0M(0%)->0M(0%) >> [68.926s][info][gc,reloc??? ] GC(6) Relocation: Successful >> [...] >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 >> Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 >> >> /Per From thomas.schatzl at oracle.com Wed Jan 22 09:40:04 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 10:40:04 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> Message-ID: Hi Liang, On 21.01.20 15:26, Liang Mao wrote: > Hi Thomas, > > Thank you for pointing out my mistake for comparing iterating object array > with card scanning that I missed the detail that card scanning doesn't need > to scan the whole object array. > > I didn't provide gc log because I haven't sufficient statistics data about > humongous distribution or the object arrays. The solution is just > straightforward > because increasing G1HeapRegionSize fixes the problem so I want to do > the same to > G1HeapRegionSize=32m. In my earlier memory of tunning some > typical?applications, > humongous objects occupy more than half of the used heap after young GC with > default G1HeapRegionSize. I guess perhaps half of our applications may > encounter > the issue with default setting. So currently we use the G1HeapRegionSize as > approximately 1/500 of Xmx. > I know that iterating humongous object array in young GC might significantly > degrade the pause time orientied philosophy. But if the pause time is > already in > expectation with CMS such behavior isn't doing anything worse but avoid > the GC > turbulence by concurrent mark. Beside the obvious penalty to pause time, do > you have any other concerns? Ultimately, no, but given that there are options that seem all-around better or there are things (e.g. humongous object tail allocations) to do first I would not spend time on that at this time :) > >> Note?that?its?allocation?could?still?be?counted?against?the?eden >> allowance?in?some?situations.?This?could?be?seen?as?a?way?to?slow?down >> the?mutator?while?it?is?busy?trying?to?complete?the?marking. > >> I?am?however?not?sure?if?it?helps?a?lot?assuming?that?changes?to?perform >> eager?reclaim?on?objArrays?won't?work?during?marking?btw.?There?would?be >> need?for?a?different?kind?of?enforcing?such?an?allocation?penalty. > > I'm sorry I didn't get these 2 paragraphs. Could you please explain more? The first sentence starts talking about a hybrid approach: keep the object in old gen, but still account it against the allowed eden allocation. The problem is how to account this: you do not really want to account the full regions against it, because that would cause more gcs than CMS as we don't/can't allocate into tail of humongous objects at all. Counting fractions of regions would leave you with the decision to be conservative and make an eden allocation region unused. E.g. eden allocation budget is 50 regions, you allocate 50 humongous objects half a region each. Should that exhaust the budget (in case regions are counted fully)? This would mean that G1 would do collections at double the rate of CMS. An option would be to make them count at their real size (for purposes of determining when eden is "exhausted"), but that might leave you with a fraction of a region for the last region. The other problem is of course, while it's accounted for its actual size, it still takes twice space in the heap (i.e. the to-space exhaustion issue). This is why the mention of "in some situations" - if you are eating into the reserve already, it's probably better to account them in full anyway. Note that otoh if you keep the object in young gen logically too, you could imagine allocating into its tail. However keeping humongous (objArray) objects in young gen logically has the other bad properties we talked about earlier. You could even vary the strategy between objArray and non-objArrays. So this needs some more thought, and the policy written down, and tested on "real" applications. :) The last sentence in that paragraph refers to somehow slow down the mutator when eating in the reserve to complete the marking in time. The second paragraph is about questioning the slowing down mechanism a bit: even during this situation, when you are marking, and slowing down the mutator, additional gcs do not help you a lot wrt to eager reclaim of humongous objArray objects, as the most likely (initial) implementation of that would not do eager reclaim during marking (it's doable, you need to keep the satb invariant). Yeah, these two paragraphs compressed my thoughts maybe a bit too much. ;) Thanks, Thomas From per.liden at oracle.com Wed Jan 22 09:54:17 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 22 Jan 2020 10:54:17 +0100 Subject: RFR: 8234440: ZGC: Print relocation information on info level In-Reply-To: <5f1ca9a1-0130-cdb1-30e6-4bc88aa3a684@oracle.com> References: <36df9c0c-9b35-0d6f-df7f-8b9b781818cb@oracle.com> <9700c26c-8ee9-f58d-1c1c-a1fa6a53b24b@oracle.com> <5f1ca9a1-0130-cdb1-30e6-4bc88aa3a684@oracle.com> Message-ID: <8ea2cd7c-04b5-dd60-3525-c531d343c63f@oracle.com> Thanks Erik! /Per On 1/22/20 10:04 AM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > /Erik > > On 1/21/20 1:56 PM, Per Liden wrote: >> I got some off-line comments from Stefan, updated webrev: >> >> Diff: http://cr.openjdk.java.net/~pliden/8234440/webrev.1-diff >> Full: http://cr.openjdk.java.net/~pliden/8234440/webrev.1 >> >> /Per >> >> On 1/21/20 11:18 AM, Per Liden wrote: >>> When using -Xlog:gc*, I now and then find that I miss basic >>> relocation information, since it's currently printed at the debug >>> level on the relocation set selector. I think we should leave the >>> current logging as is, since that's still useful when debugging the >>> relocation set selector itself. However, I think we should propagate >>> some of the high level information and print it on the info level. >>> >>> Here's an example of what the output looks like with this patch: >>> >>> [...] >>> [68.926s][info][gc,reloc??? ] GC(6) Small Pages: 529 / 1058M(93%), >>> Empty: 350M(31%), Compacting: 450M(40%)->20M(2%) >>> [68.926s][info][gc,reloc??? ] GC(6) Medium Pages: 2 / 64M(6%), Empty: >>> 0M(0%), Compacting: 64M(6%)->32M(3%) >>> [68.926s][info][gc,reloc??? ] GC(6) Large Pages: 2 / 12M(1%), Empty: >>> 6M(1%), Compacting: 0M(0%)->0M(0%) >>> [68.926s][info][gc,reloc??? ] GC(6) Relocation: Successful >>> [...] >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234440 >>> Webrev: http://cr.openjdk.java.net/~pliden/8234440/webrev.0 >>> >>> /Per > From per.liden at oracle.com Wed Jan 22 09:57:25 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 22 Jan 2020 10:57:25 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> Message-ID: <03e96d75-4449-460a-fbc9-a6e1d5f7639c@oracle.com> Looks good to me. /Per On 1/17/20 2:31 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to remove the automatic "is in heap" > verification from OopIterateClosure. > > https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8237363 > > OopIterateClosure provides some automatic verification that loaded > objects are inside the heap. Closures can opt out from this by > overriding should_verify_oops(). > > I propose that we move this verification, and the way to turn it off, > and instead let the implementations of the closures decide the kind of > verification that is appropriate. I want to do this to de-clutter the > closure APIs a bit. > > I've gone through all OopIterateClosures that don't override > should_verify_oops() and added calls to > assert_oop_field_points_to_object_in_heap[_or_null] where the closures > didn't have equivalent checks. > > A lot of the places didn't explicitly check that the object is within > the heap but they would check for other things like: > - Is the corresponding bit index within the range > - Is the heap region index within range > - Is the object in the reserved heap range (weaker than is_in) > > I've added asserts to those places. If you think I should remove some of > them, please let me now. > > Tested with tier1-3 > > Thanks, > StefanK From maoliang.ml at alibaba-inc.com Wed Jan 22 10:02:03 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 22 Jan 2020 18:02:03 +0800 Subject: =?UTF-8?B?UmU6IERpc2N1c3Npb246IGltcHJvdmUgaHVtb25nb3VzIG9iamVjdHMgaGFuZGxpbmcgZm9y?= =?UTF-8?B?IEcx?= In-Reply-To: References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com>, Message-ID: <45600e6d-08c9-4035-bcc5-78db69bfe8b0.maoliang.ml@alibaba-inc.com> Hi Thomas, Thanks for your rich explanation. I saw those problems but didn't think so much:) My previous approach definitely would increase the GC frequency so after you provide idea of canceling cm cycle I thought it would be a better one. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 22 (Wed.) 17:40 To:"MAO, Liang" ; Man Cao ; hotspot-gc-dev Subject:Re: Discussion: improve humongous objects handling for G1 Hi Liang, On 21.01.20 15:26, Liang Mao wrote: > Hi Thomas, > > Thank you for pointing out my mistake for comparing iterating object array > with card scanning that I missed the detail that card scanning doesn't need > to scan the whole object array. > > I didn't provide gc log because I haven't sufficient statistics data about > humongous distribution or the object arrays. The solution is just > straightforward > because increasing G1HeapRegionSize fixes the problem so I want to do > the same to > G1HeapRegionSize=32m. In my earlier memory of tunning some > typical applications, > humongous objects occupy more than half of the used heap after young GC with > default G1HeapRegionSize. I guess perhaps half of our applications may > encounter > the issue with default setting. So currently we use the G1HeapRegionSize as > approximately 1/500 of Xmx. > I know that iterating humongous object array in young GC might significantly > degrade the pause time orientied philosophy. But if the pause time is > already in > expectation with CMS such behavior isn't doing anything worse but avoid > the GC > turbulence by concurrent mark. Beside the obvious penalty to pause time, do > you have any other concerns? Ultimately, no, but given that there are options that seem all-around better or there are things (e.g. humongous object tail allocations) to do first I would not spend time on that at this time :) > >> Note that its allocation could still be counted against the eden >> allowance in some situations. This could be seen as a way to slow down >> the mutator while it is busy trying to complete the marking. > >> I am however not sure if it helps a lot assuming that changes to perform >> eager reclaim on objArrays won't work during marking btw. There would be >> need for a different kind of enforcing such an allocation penalty. > > I'm sorry I didn't get these 2 paragraphs. Could you please explain more? The first sentence starts talking about a hybrid approach: keep the object in old gen, but still account it against the allowed eden allocation. The problem is how to account this: you do not really want to account the full regions against it, because that would cause more gcs than CMS as we don't/can't allocate into tail of humongous objects at all. Counting fractions of regions would leave you with the decision to be conservative and make an eden allocation region unused. E.g. eden allocation budget is 50 regions, you allocate 50 humongous objects half a region each. Should that exhaust the budget (in case regions are counted fully)? This would mean that G1 would do collections at double the rate of CMS. An option would be to make them count at their real size (for purposes of determining when eden is "exhausted"), but that might leave you with a fraction of a region for the last region. The other problem is of course, while it's accounted for its actual size, it still takes twice space in the heap (i.e. the to-space exhaustion issue). This is why the mention of "in some situations" - if you are eating into the reserve already, it's probably better to account them in full anyway. Note that otoh if you keep the object in young gen logically too, you could imagine allocating into its tail. However keeping humongous (objArray) objects in young gen logically has the other bad properties we talked about earlier. You could even vary the strategy between objArray and non-objArrays. So this needs some more thought, and the policy written down, and tested on "real" applications. :) The last sentence in that paragraph refers to somehow slow down the mutator when eating in the reserve to complete the marking in time. The second paragraph is about questioning the slowing down mechanism a bit: even during this situation, when you are marking, and slowing down the mutator, additional gcs do not help you a lot wrt to eager reclaim of humongous objArray objects, as the most likely (initial) implementation of that would not do eager reclaim during marking (it's doable, you need to keep the satb invariant). Yeah, these two paragraphs compressed my thoughts maybe a bit too much. ;) Thanks, Thomas From thomas.schatzl at oracle.com Wed Jan 22 10:05:19 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 11:05:19 +0100 Subject: Discussion: improve humongous objects handling for G1 In-Reply-To: References: <3b35ff1d-b717-42c8-bd1f-28a54a5a0ec4.maoliang.ml@alibaba-inc.com> <54f40191-82ff-43af-aaa5-5821efc59bed.maoliang.ml@alibaba-inc.com> Message-ID: Hi Man, On 22.01.20 06:05, Man Cao wrote: > Hi all, > > Thanks for the great discussion from Thomas and Liang! > > Regarding to GC logs, histogram of humongous allocations, and a more > concrete example, I guess we are in the same boat here. We only advised > users to increase G1HeapRegionSize, which would work around many cases > of the problem. We have not yet closely studied patterns of the > problematic humongous allocations. I will do such a study and follow up > with some statistics and GC logs when I get my hands on?them. > [...] > > > Btw the same text also indicates that copying seems like a non-starter > > anyway, as, quoting from the text "One use case, SPECjbb2015 benchmark > > is not being able to finish RT curve...". > > Not sure what prevents arraylets in particular from being O(1); a > > particular access is slower though due to the additional indirection > > with the spine. > > ... > > Which means that there is significant optimization work needed to make > > array access "as fast" as before in jitted code > These two issues: > (1) copying for JNI Critical > (2) slowing down typical jitted code for array accesses > do sound like performance deal-breakers, particularly if they are only > required for G1+arraylets but not other collectors. There are some use > cases of JNI Critical on arrays that are solely for performance reasons, > and we'd rather not slow them down +1 > > > It could help with all problems but cases where you allocate a very > > large of humongous objects and you can't keep the humognous object > > tails filled. This option still keeps the invariant that humongous > > objects need to be allocated at a region boundary. > > > > Most of the other ideas you propose below also (seem to) retain this > > property. > Agreed. It seems that JDK-8172713 would help most ideas anyway. Yeah :) > > Maybe it is sufficient as "most" applications only use single or low > > double-digit GB heaps at the moment where the entire reservation still > > fits into the 32gb barrier. > I also had the same thought. Most of our important workloads have heap > sizes less than 20GB. > If the "reserve multiple MaxHeapSize" approach could work with > compressed oops for <16GB heap, then it is quite acceptable. I agree, it could help in a lot of cases while not (apparently) costing much. On 32 bit systems this is simply disabled (ie. -Xmx == reservation size). I think even a small over-reservation would help in a lot of cases for external fragmentation. One could think of "cheating" a little with the actual memory usage/commit size by only committing what the humongous object actually needs if you wanted. This would complicate size accounting quite a bit though (and increase commit/uncommit calls), so JDK-8172713 seems favorable at least. > That said, now I do agree that I should first study the patterns of > humongous allocation and look into improvement on eager reclamation. > > For the approach from Liang/Alibaba, I'm optimistic that it could solve > many problems migrating from ParNew+CMS to G1. Because it handles > humongous allocations in a similar way as ParNew+CMS does, plus G1 has > the advantage of not copying humongous objects, the pause > duration/frequency would probably not degrade compared to ParNew/CMS. > I also agree with Thomas that it may increase pause duration compared to > current G1 due to extra scanning, and allocation spikes might affect > other?aspects of G1. I noticed in the description for JDK-8027959: "a) > logically keep LOBs in young gen, doing in-place aging", which sounds > like the GC team have explored this approach for eager reclamation > before? Yes, with the issue described before (with objArray humongous objects - non-objArrays are not an issue from a scanning POV, but they still are from an accounting one). It might be the best of both worlds if we could make eager > reclamation of humongous objArrays work without putting them in young > gen, and further improve eager reclamation in general. > Considering only the "avoid humongous object fragmentation" area, yes :) Thanks, Thomas From stefan.karlsson at oracle.com Wed Jan 22 10:20:36 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 22 Jan 2020 11:20:36 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <03e96d75-4449-460a-fbc9-a6e1d5f7639c@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> <03e96d75-4449-460a-fbc9-a6e1d5f7639c@oracle.com> Message-ID: <6ddc429b-a10d-7ec8-0fbb-1e82f7d15dfa@oracle.com> Thanks, Per. StefanK On 2020-01-22 10:57, Per Liden wrote: > Looks good to me. > > /Per > > On 1/17/20 2:31 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to remove the automatic "is in heap" >> verification from OopIterateClosure. >> >> https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8237363 >> >> OopIterateClosure provides some automatic verification that loaded >> objects are inside the heap. Closures can opt out from this by >> overriding should_verify_oops(). >> >> I propose that we move this verification, and the way to turn it off, >> and instead let the implementations of the closures decide the kind of >> verification that is appropriate. I want to do this to de-clutter the >> closure APIs a bit. >> >> I've gone through all OopIterateClosures that don't override >> should_verify_oops() and added calls to >> assert_oop_field_points_to_object_in_heap[_or_null] where the closures >> didn't have equivalent checks. >> >> A lot of the places didn't explicitly check that the object is within >> the heap but they would check for other things like: >> - Is the corresponding bit index within the range >> - Is the heap region index within range >> - Is the object in the reserved heap range (weaker than is_in) >> >> I've added asserts to those places. If you think I should remove some >> of them, please let me now. >> >> Tested with tier1-3 >> >> Thanks, >> StefanK From thomas.schatzl at oracle.com Wed Jan 22 10:46:26 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 11:46:26 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: References: Message-ID: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> Hi all, On 17.01.20 16:07, Leo Korinth wrote: > Hi Thomas, > > This is not a review. This code is basically the same code as is I took it as one anyway ;) > duplicated at least three times in the test code. One of the > duplications you can blame me for, *sorry*. I believe it should be moved > to a common library method. I also believe the last fix you did in > TestG1ParallelPhases.java makes that version look cleaner than what you > propose here (it does not need the last allocation loop at all). > > How about using the TestG1ParallelPhases.java version for all three test > cases? If not, do the third version in TestOldGenCollectionUsage really > work??? > Here's a webrev incorporating these suggestions to unify the code: http://cr.openjdk.java.net/~tschatzl/8237079/webrev.1 (full) There is no point to provide a diff webrev here as the whole change has been redone. - factor out and use a MixedGCProvoker class in all three of those tests. - some changes in the various tests to align their option a bit more and stabilize them - TestOldCollectionUsage assumed that there were no previous old gen allocations, actually it failed if there were. Since we can't guarantee that, loosened the condition to require update of the mixed gc usage only. - for gc/g1/mixedgc/Testlogging.java removed the need to match the whole log message including the "G1 Evacuation Pause" gc cause message. It did not seem to be point of the test to check that the mixed gc has been caused "naturally" by eden exhaustion or via whitebox. Thanks, Thomas From aph at redhat.com Wed Jan 22 10:52:52 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 22 Jan 2020 10:52:52 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> Message-ID: <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> On 1/15/20 1:00 AM, David Holmes wrote: > On 15/01/2020 2:15 am, Andrew Haley wrote: >> On 1/14/20 3:52 PM, Doerr, Martin wrote: >> >>> good catch. I think you're right. A multi-copy-atomic, but weak >>> architecture (e.g. aarch64) needs an instruction which orders both >>> volatile loads. >> >> Good, I thought so. >> >> Given that TSO machines define OrderAccess::acquire() as no more than >> a compiler barrier, I believe that we could do something like >> >> #ifdef CPU_MULTI_COPY_ATOMIC >> OrderAccess::acquire(); >> #else >> OrderAccess::fence(); >> #endif > > "acquire" isn't used to order loads it is used to pair with a "release" > associated with the store of the variable now being loaded. > > If this is the code referred to: > > Age oldAge = _age.get(); > // Architectures with weak memory model require a barrier here > // to guarantee that bottom is not older than age, > // which is crucial for the correctness of the algorithm. > #ifndef CPU_MULTI_COPY_ATOMIC > OrderAccess::fence(); > #endif > uint localBot = Atomic::load_acquire(&_bottom); > > then I think there is an assumption (perhaps incorrect) that the > load_acquire will prevent reordering as well as performing the necessary > "acquire" semantics. It depends on how _age is written to. As far as I can see there is no ordering between setting _bottom and setting _age, void set_empty() { _bottom = 0; _age.set(0); } so it looks like any kind of fence on the reader side is pointless anyway. In that case, I don't know why we're doing any of this if it doesn't matter what order the reader threads see updates to _age and _bottom. It's all rather baffling. _bottom is declared volatile, as is _age, so I guess there must be some ordering requirements, but no fences on the writing side to enforce it. What actually are the ordering requirements between _bottom and _age? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From thomas.schatzl at oracle.com Wed Jan 22 11:02:51 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 12:02:51 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> Message-ID: <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> Hi, On 17.01.20 14:31, Stefan Karlsson wrote: > Hi all, > > Please review this patch to remove the automatic "is in heap" > verification from OopIterateClosure. > > https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8237363 > > OopIterateClosure provides some automatic verification that loaded > objects are inside the heap. Closures can opt out from this by > overriding should_verify_oops(). > > I propose that we move this verification, and the way to turn it off, > and instead let the implementations of the closures decide the kind of > verification that is appropriate. I want to do this to de-clutter the > closure APIs a bit. > While the change is correct, I am not really convinced it is a good idea to trade verification in one place to the same verification in many place. The closure API does not seem to be particularly "cluttered up" by this particular API to me. It is true that other code typically has many other asserts that would fail anyway, but it would be an additional safety net when writing new closures. This is not a hard no for this change, but is there something else you are planning to do in this area where this code would be in the way? > I've gone through all OopIterateClosures that don't override > should_verify_oops() and added calls to > assert_oop_field_points_to_object_in_heap[_or_null] where the closures > didn't have equivalent checks. > > A lot of the places didn't explicitly check that the object is within > the heap but they would check for other things like: > - Is the corresponding bit index within the range > - Is the heap region index within range > - Is the object in the reserved heap range (weaker than is_in) > > I've added asserts to those places. If you think I should remove some of > them, please let me now. > > Tested with tier1-3 > Thanks, Thomas From maoliang.ml at alibaba-inc.com Wed Jan 22 11:57:06 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 22 Jan 2020 19:57:06 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> Hi Thomas, I just uploaded the new patch for SoftMaxHeapSize. The shrink works immediately. The concurrent uncommit will be in a different patch. http://cr.openjdk.java.net/~luchsh/8236073.webrev.2/ Thanks, Liang ------------------------------------------------------------------ From:MAO, Liang Send Time:2020 Jan. 16 (Thu.) 11:21 To:Thomas Schatzl ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Thomas, Yes. We can focus on the current concurrent shrinking for now. You are right that changing the default behavior will be sensitive since you need to cover all types of applications including throughput and low-latency while our previous patch is mostly designed for low-latency. We'll figure this out later:) Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 16 (Thu.) 01:57 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On Wed, 2020-01-15 at 20:53 +0800, Liang Mao wrote: > Hi Thomas, > > So G1 doesn't need to shrink below Xms if SoftMaxHeapSize is > below Xms, does it? > No, never shrink below MinHeapSize. > Another question is that no matter we have an additional option we > had better have 2 criterions. The first is for urgent expansion that > GCTimeRatio is quite low and concurrent expansion with frequent GCs > is more harmful and expansion should be done immediately. It's the > current default flow as we found that 12 is a good number below which > applications can obviously incur timeout errors. The second is to > keep the GCTimeRatio and memory footprint in a balanced state so > any adjustments are better to be concurrent. The original number 99 > fits well here. If we have only one option "GCTimeRatio", we might > not be able to achieve both. Maybe we can have a LowGCTimeRatio below > which suppose to be not acceptable and a HighTimeRatio which is > certainly healthy. So far the change has been about shrinking the heap concurrently, and not expansion. Let's concentrate on the issue at hand, i.e. see how heap shrinking at more places turns out. I believe there will be lots of tweaking needed for this change to not show too many regressions in other applications. Remember that the defaults should work well for a large body of applications, not just a few. There may be knobs to tune it for others. Then look concurrent expansion, at application phase changes in the application, how to detect, and how to react best. Just for reference, last time we changed the sizing algorithm it took a few months to get it "right", with mostly improvements all around. Thanks, Thomas From david.holmes at oracle.com Wed Jan 22 11:59:21 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 22 Jan 2020 21:59:21 +1000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> Message-ID: <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> On 22/01/2020 8:52 pm, Andrew Haley wrote: > On 1/15/20 1:00 AM, David Holmes wrote: >> On 15/01/2020 2:15 am, Andrew Haley wrote: >>> On 1/14/20 3:52 PM, Doerr, Martin wrote: >>> >>>> good catch. I think you're right. A multi-copy-atomic, but weak >>>> architecture (e.g. aarch64) needs an instruction which orders both >>>> volatile loads. >>> >>> Good, I thought so. >>> >>> Given that TSO machines define OrderAccess::acquire() as no more than >>> a compiler barrier, I believe that we could do something like >>> >>> #ifdef CPU_MULTI_COPY_ATOMIC >>> OrderAccess::acquire(); >>> #else >>> OrderAccess::fence(); >>> #endif >> >> "acquire" isn't used to order loads it is used to pair with a "release" >> associated with the store of the variable now being loaded. >> >> If this is the code referred to: >> >> Age oldAge = _age.get(); >> // Architectures with weak memory model require a barrier here >> // to guarantee that bottom is not older than age, >> // which is crucial for the correctness of the algorithm. >> #ifndef CPU_MULTI_COPY_ATOMIC >> OrderAccess::fence(); >> #endif >> uint localBot = Atomic::load_acquire(&_bottom); >> >> then I think there is an assumption (perhaps incorrect) that the >> load_acquire will prevent reordering as well as performing the necessary >> "acquire" semantics. > > It depends on how _age is written to. > > As far as I can see there is no ordering between setting _bottom and setting > _age, > > void set_empty() { > _bottom = 0; > _age.set(0); > } > > so it looks like any kind of fence on the reader side is pointless anyway. In > that case, I don't know why we're doing any of this if it doesn't matter > what order the reader threads see updates to _age and _bottom. > > It's all rather baffling. _bottom is declared volatile, as is _age, so I > guess there must be some ordering requirements, but no fences on the > writing side to enforce it. > > What actually are the ordering requirements between _bottom and _age? I'm assuming the ordering requirement is to preserve the order as expressed in the code. There is likely an assumption that by declaring both as volatile that the the compiler will not reorder them; and that the load_acquire will prevent the hardware from reordering them. I'm not sure if either of those assumptions are actually valid. But that doesn't explain the complete lack of barriers in set_empty. The GC folk will need to chime in on the detailed semantic requirements of this algorithm. David From stefan.karlsson at oracle.com Wed Jan 22 13:16:53 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 22 Jan 2020 14:16:53 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> Message-ID: <246ce191-0f61-1bd5-caec-71299bfcebef@oracle.com> On 2020-01-22 12:02, Thomas Schatzl wrote: > Hi, > > On 17.01.20 14:31, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to remove the automatic "is in heap" >> verification from OopIterateClosure. >> >> https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8237363 >> >> OopIterateClosure provides some automatic verification that loaded >> objects are inside the heap. Closures can opt out from this by >> overriding should_verify_oops(). >> >> I propose that we move this verification, and the way to turn it off, >> and instead let the implementations of the closures decide the kind of >> verification that is appropriate. I want to do this to de-clutter the >> closure APIs a bit. >> > > While the change is correct, I am not really convinced it is a good idea > to trade verification in one place to the same verification in many place. An alternative would be to simply remove the verification altogether. As I said, we almost always check the result of the object address. > > The closure API does not seem to be particularly "cluttered up" by this > particular API to me. It's a slippery slope. Previously, we had a lot of GC specific functions in these interfaces. I've been cleaning this over the years, and this is one of the last non-essential parts of that interface that implementors need to consider. With my removal people don't have to think about this anymore. It is true that other code typically has many > other asserts that would fail anyway, but it would be an additional > safety net when writing new closures. It's a safety net that works for G1, but almost always is incorrectly trips in the assert with ZGC. > > This is not a hard no for this change, but is there something else you > are planning to do in this area where this code would be in the way? No. StefanK > >> I've gone through all OopIterateClosures that don't override >> should_verify_oops() and added calls to >> assert_oop_field_points_to_object_in_heap[_or_null] where the closures >> didn't have equivalent checks. >> >> A lot of the places didn't explicitly check that the object is within >> the heap but they would check for other things like: >> - Is the corresponding bit index within the range >> - Is the heap region index within range >> - Is the object in the reserved heap range (weaker than is_in) >> >> I've added asserts to those places. If you think I should remove some >> of them, please let me now. >> >> Tested with tier1-3 >> > > Thanks, > ? Thomas > From stefan.karlsson at oracle.com Wed Jan 22 14:02:32 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 22 Jan 2020 15:02:32 +0100 Subject: RFR: 8237645: Remove OopsInGenClosure::par_do_barrier Message-ID: <34dcda2a-ef2b-65e0-b6f2-fae553d95983@oracle.com> Hi all, Please review this patch to some dead code after the CMS removal. https://cr.openjdk.java.net/~stefank/8237645/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8237645 Thanks, StefanK From rkennke at redhat.com Wed Jan 22 14:33:17 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 22 Jan 2020 15:33:17 +0100 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase In-Reply-To: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> References: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> Message-ID: Hi Zhengyu, Would it be possible to use scoped lockers instead in: src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp The rest looks ok to me. Thanks, Roman > Please review this patch that moves string deduplication cleanup task > into concurrent phase. > > The cleanup task composites two subtasks: StringDedupTable and > StringDedupQueue cleanup. > > Concurrent StringDedupTable cleanup is very straightforward. GC takes > StringDedupTable_lock to block out mutators from modifying the table, > then performs multi-thread cleanup, just as it does at STW pause. > > Concurrent StringDedupQueue cleanup is more complicated. GC takes > StringDedupQueue_lock, only blocks queue structure changes, while > mutators can still enqueue new string candidates and dedup thread can > still perform deduplication. So there are a couple of synchronizations > need to be established. > > 1) When mutator enqueues a candidate, the enqueued oop should be valid > before the slot can be made visible to GC threads. > > 2) When GC thread updates oop, it needs to make sure that dedup thread > does not see partially updated oop. > > The implementation uses load_acquire/release_store pair to ensure above > synchronization held. > > GC threads may miss some just enqueued oops by mutators. This is not a > concern, since LRB guarantees they are in to-space. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ > > > Test: > ? hotspot_gc_shenandoah with -XX:+UseStringDeduplication > ? (fastdebug and release) on x86_64 and aarch64 Linux > > Thanks, > > -Zhengyu > From zgu at redhat.com Wed Jan 22 14:37:12 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 22 Jan 2020 09:37:12 -0500 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase In-Reply-To: References: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> Message-ID: <06325fc4-3ae6-c25c-d293-47f58962417d@redhat.com> Hi Roman, Thanks for the review. On 1/22/20 9:33 AM, Roman Kennke wrote: > Hi Zhengyu, > > Would it be possible to use scoped lockers instead in: > > src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp > They are conditional and somewhat already scoped, e.g. lock in constructor and unlock i destructor. -Zhengyu > The rest looks ok to me. > > Thanks, > Roman > >> Please review this patch that moves string deduplication cleanup task >> into concurrent phase. >> >> The cleanup task composites two subtasks: StringDedupTable and >> StringDedupQueue cleanup. >> >> Concurrent StringDedupTable cleanup is very straightforward. GC takes >> StringDedupTable_lock to block out mutators from modifying the table, >> then performs multi-thread cleanup, just as it does at STW pause. >> >> Concurrent StringDedupQueue cleanup is more complicated. GC takes >> StringDedupQueue_lock, only blocks queue structure changes, while >> mutators can still enqueue new string candidates and dedup thread can >> still perform deduplication. So there are a couple of synchronizations >> need to be established. >> >> 1) When mutator enqueues a candidate, the enqueued oop should be valid >> before the slot can be made visible to GC threads. >> >> 2) When GC thread updates oop, it needs to make sure that dedup thread >> does not see partially updated oop. >> >> The implementation uses load_acquire/release_store pair to ensure above >> synchronization held. >> >> GC threads may miss some just enqueued oops by mutators. This is not a >> concern, since LRB guarantees they are in to-space. >> >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ >> >> >> Test: >> ? hotspot_gc_shenandoah with -XX:+UseStringDeduplication >> ? (fastdebug and release) on x86_64 and aarch64 Linux >> >> Thanks, >> >> -Zhengyu >> > From martin.doerr at sap.com Wed Jan 22 15:01:01 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 22 Jan 2020 15:01:01 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> Message-ID: Hi Andrew and David, the scenario for which these barriers are needed is not so trivial: Thread1: set bottom (push) Thread2: read age, read bottom, set age (pop_global) Thread3: read age, read bottom (pop_global) The requirement is that Thread3 must never read an older bottom value than Thread2 after Thread3 has seen the age value from Thread2. The age is updated by cmpxchg in pop_global which already implies strict ordering so there's no extra release barrier. I'd rather choose OrderAccess::loadload in pop_global, but I don't think it's a big deal. I'll be glad if somebody from GC can double-check if I remembered this stuff correctly. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Mittwoch, 22. Januar 2020 12:59 > To: Andrew Haley ; Doerr, Martin > ; Derek White ; hotspot- > gc-dev at openjdk.java.net; Kim Barrett > Subject: Re: RFR(S): 8229422: Taskqueue: Outdated selection of weak > memory model platforms > > On 22/01/2020 8:52 pm, Andrew Haley wrote: > > On 1/15/20 1:00 AM, David Holmes wrote: > >> On 15/01/2020 2:15 am, Andrew Haley wrote: > >>> On 1/14/20 3:52 PM, Doerr, Martin wrote: > >>> > >>>> good catch. I think you're right. A multi-copy-atomic, but weak > >>>> architecture (e.g. aarch64) needs an instruction which orders both > >>>> volatile loads. > >>> > >>> Good, I thought so. > >>> > >>> Given that TSO machines define OrderAccess::acquire() as no more than > >>> a compiler barrier, I believe that we could do something like > >>> > >>> #ifdef CPU_MULTI_COPY_ATOMIC > >>> OrderAccess::acquire(); > >>> #else > >>> OrderAccess::fence(); > >>> #endif > >> > >> "acquire" isn't used to order loads it is used to pair with a "release" > >> associated with the store of the variable now being loaded. > >> > >> If this is the code referred to: > >> > >> Age oldAge = _age.get(); > >> // Architectures with weak memory model require a barrier here > >> // to guarantee that bottom is not older than age, > >> // which is crucial for the correctness of the algorithm. > >> #ifndef CPU_MULTI_COPY_ATOMIC > >> OrderAccess::fence(); > >> #endif > >> uint localBot = Atomic::load_acquire(&_bottom); > >> > >> then I think there is an assumption (perhaps incorrect) that the > >> load_acquire will prevent reordering as well as performing the necessary > >> "acquire" semantics. > > > > It depends on how _age is written to. > > > > As far as I can see there is no ordering between setting _bottom and > setting > > _age, > > > > void set_empty() { > > _bottom = 0; > > _age.set(0); > > } > > > > so it looks like any kind of fence on the reader side is pointless anyway. In > > that case, I don't know why we're doing any of this if it doesn't matter > > what order the reader threads see updates to _age and _bottom. > > > > It's all rather baffling. _bottom is declared volatile, as is _age, so I > > guess there must be some ordering requirements, but no fences on the > > writing side to enforce it. > > > > What actually are the ordering requirements between _bottom and _age? > > I'm assuming the ordering requirement is to preserve the order as > expressed in the code. There is likely an assumption that by declaring > both as volatile that the the compiler will not reorder them; and that > the load_acquire will prevent the hardware from reordering them. I'm not > sure if either of those assumptions are actually valid. > > But that doesn't explain the complete lack of barriers in set_empty. > > The GC folk will need to chime in on the detailed semantic requirements > of this algorithm. > > David From rkennke at redhat.com Wed Jan 22 15:57:30 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 22 Jan 2020 16:57:30 +0100 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase In-Reply-To: <06325fc4-3ae6-c25c-d293-47f58962417d@redhat.com> References: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> <06325fc4-3ae6-c25c-d293-47f58962417d@redhat.com> Message-ID: Hi Zhengyu, >> Hi Zhengyu, >> >> Would it be possible to use scoped lockers instead in: >> >> src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp >> > > They are conditional and somewhat already scoped, e.g. lock in > constructor and unlock i destructor. Hmmhmm. Ok then. Roman > > -Zhengyu > >> The rest looks ok to me. >> >> Thanks, >> Roman >> >>> Please review this patch that moves string deduplication cleanup task >>> into concurrent phase. >>> >>> The cleanup task composites two subtasks: StringDedupTable and >>> StringDedupQueue cleanup. >>> >>> Concurrent StringDedupTable cleanup is very straightforward. GC takes >>> StringDedupTable_lock to block out mutators from modifying the table, >>> then performs multi-thread cleanup, just as it does at STW pause. >>> >>> Concurrent StringDedupQueue cleanup is more complicated. GC takes >>> StringDedupQueue_lock, only blocks queue structure changes, while >>> mutators can still enqueue new string candidates and dedup thread can >>> still perform deduplication. So there are a couple of synchronizations >>> need to be established. >>> >>> 1) When mutator enqueues a candidate, the enqueued oop should be valid >>> before the slot can be made visible to GC threads. >>> >>> 2) When GC thread updates oop, it needs to make sure that dedup thread >>> does not see partially updated oop. >>> >>> The implementation uses load_acquire/release_store pair to ensure above >>> synchronization held. >>> >>> GC threads may miss some just enqueued oops by mutators. This is not a >>> concern, since LRB guarantees they are in to-space. >>> >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ >>> >>> >>> Test: >>> ?? hotspot_gc_shenandoah with -XX:+UseStringDeduplication >>> ?? (fastdebug and release) on x86_64 and aarch64 Linux >>> >>> Thanks, >>> >>> -Zhengyu >>> >> > From leo.korinth at oracle.com Wed Jan 22 16:02:06 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Wed, 22 Jan 2020 17:02:06 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> References: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> Message-ID: <0f272a73-d817-150e-ed68-687737445580@oracle.com> On 22/01/2020 11:46, Thomas Schatzl wrote: > Hi all, > > On 17.01.20 16:07, Leo Korinth wrote: >> Hi Thomas, >> >> This is not a review. This code is basically the same code as is > > I took it as one anyway ;) > >> duplicated at least three times in the test code. One of the >> duplications you can blame me for, *sorry*. I believe it should be >> moved to a common library method. I also believe the last fix you did >> in TestG1ParallelPhases.java makes that version look cleaner than what >> you propose here (it does not need the last allocation loop at all). >> >> How about using the TestG1ParallelPhases.java version for all three >> test cases? If not, do the third version in TestOldGenCollectionUsage >> really work??? >> > > Here's a webrev incorporating these suggestions to unify the code: > > http://cr.openjdk.java.net/~tschatzl/8237079/webrev.1 (full) > > There is no point to provide a diff webrev here as the whole change has > been redone. > > ?- factor out and use a MixedGCProvoker class in all three of those tests. > ?- some changes in the various tests to align their option a bit more > and stabilize them > ?- TestOldCollectionUsage assumed that there were no previous old gen > allocations, actually it failed if there were. Since we can't guarantee > that, loosened the condition to require update of the mixed gc usage only. > ?- for gc/g1/mixedgc/Testlogging.java removed the need to match the > whole log message including the "G1 Evacuation Pause" gc cause message. > It did not seem to be point of the test to check that the mixed gc has > been caused "naturally" by eden exhaustion or via whitebox. I think this looks *really* good. Also, thanks for taking the extra time to make it shared and reusable code. /Leo > > Thanks, > ? Thomas > From thomas.schatzl at oracle.com Wed Jan 22 16:12:39 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 17:12:39 +0100 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> Message-ID: Hi Kim, On 16.01.20 09:51, Kim Barrett wrote: > Please review this change to eliminate the DirtyCardQ_cbl_mon. This > is one of the two remaining super-special "access" ranked mutexes. > (The other is the Shared_DirtyCardQ_lock, whose elimination is covered > by JDK-8221360.) > > There are three main parts to this change. > > (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a > lock-free FIFO queue. > > (2) Replace the use of a HotSpot monitor for signaling activation of > concurrent refinement threads with a semaphore-based solution. > > (3) Handle pausing of buffer refinement in the middle of a buffer in > order to handle a pending safepoint request. This can no longer just > push the partially processed buffer back onto the queue, due to ABA > problems now that the buffer is lock-free. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8237143 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ > > Testing: > mach5 tier1-5 > Normal performance testing showed no significant change. > specjbb2015 on a very big machine showed a 3.5% average critical-jOPS > improvement, though not statistically significant; removing contention > for that lock by many hardware threads may be a little bit noticeable. > > initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. - noticed at least in maybe_deactivate, the description of the method and parameter and return value documentation is in the cpp file at the definition. I would really prefer this information in the hpp file for easier reference. - documentation about the inner workings of the class should imho be put in the hpp file too (in the class documentation). Same as documentation about member variables in seemingly random places now. This makes the code unusually hard to read and reference for me - i.e. you can easily flip to the definition (hpp file) using a click (or accelerator key) when on the identifier to see how stuff is supposed to be used (or what it is used for) than literally grepping through the cpp file (and these blocks of documentation even reference each other). Also, IDEs show comments on methods in the hpp as pop-ups too. E.g. the documentation of _notifier and _should_notify in g1ConcurrentRefineThread.hpp:51, or the description of G1DirtCardQueueSet::_concurrency in g1DirtyCardQueue.cpp:117 (followed by description of (Non)ConcurrentVerifier, or the huge description of _completed_buffers_{head,tail} and others in G1DirtyCardQueue.cpp:163, and the description of refinement processing in g1DirtyCardQueue.cpp:301. I mean, if a user tries to get an overview on how the class works or what some members do I would argue that you would first reference the hpp file - and the only new comment there right now is some reasoning for the padding of the new members without any comment about the purpose of these. Maybe some of the comments should be split into more generic parts, and very specific implementation details (which can stay in the cpp file where they are needed - these seem sufficient now?). - just a random note: I wasn't really happy with the name of G1ConcurrentRefineThread::_should_notify, but several attempts at renaming failed for me. It seems rather generic. - some comments on the long comment about the FIFO queue handling. // _completed_buffers_{head,tail} and _num_cards provide a lock-free // FIFO of buffers, linked through their next() fields. Not sure about whether _num_cards provides anything about the FIFO (it's not even the number of buffers), it seems to be solely counting the cards held in the FIFO. Which is fine to mention, but not necessarily here. Also I would put that description of _completed_buffers to the variables in the hpp files. // The key idea to make this work is that pop (get_completed_buffer) // never returns an element of the queue if it is the only accessible // element, If I understand this correctly, maybe "if there is only one buffer in the FIFO" is easier to understand than "only accessible element". (or define "accessible element"). // e.g. its "next" value is NULL. It is expected that there s/e.g./i.e.? The code seems to unnecessarily use the NULL_buffer constant. Maybe use it here too. Overall I am not sure about the usefulness of using NULL_buffer in the code. The NULL value in Hotspot code is generally accepted as a special value, and the name "NULL_buffer" does not seem to add any information. // will be a later push/append that will make that element available to I would prefer if the documentation would be consistent with the nomenclature in the code, i.e. use either append/enqueue/get throughout or some variants of push/pop (there is no method that starts with either push or pop anywhere). While I understand that you'd typically use push/pop on a FIFO I would prefer to either rename the methods or drop the use of push/pop in this documentation. Additional identical terminology seems confusing. At least it takes time to find out what is exactly what. What would be nice in this context would be that G1DirtyCardQueueSet is implemented as a FIFO of buffers, and that append/enqueue/get* methods are the usual operations. (in the hpp file, in the class description) // a future pop, or there will eventually be a complete transfer // (take_all_completed_buffers). // // An append operation atomically exchanges the new tail with the queue // tail. It then sets the "next" value of the old tail to the head of // the list being appended. (It is an invariant that the old tail's // "next" value is NULL.) Maybe put this invariant somewhere more prominent in the text, not as side note. // But if the old tail is NULL then the queue was // empty. In this case the head of the list being appended is instead // stored in the queue head (which must be NULL). I would mention the invariant that if old tail is NULL then head must be NULL too right next to the "old tail is NULL" sentence. // A push operation is just a degenerate append, where the buffer being // pushed is both the head and the tail of the list being appended. Defining a push operation does not seem to help at all, because the documentation always mentions the pair push/append anyway (and there is no explicit "push" method in the code). I would suggest to delete this paragraph. // // This means there is a period between the exchange and the old tail // update where the queue sequence is split into two parts, the list // from the queue head to the old tail, and the list being appended. If // there are concurrent push/append operations, each may introduce // another such segment. But they all eventually get resolved by their // respective updates of their old tail's "next" value. // // pop gets the queue head as the candidate result (returning NULL if // the queue head was NULL), and then gets that result node's "next" // value. If that "next" value is NULL and the queue head hasn't // changed, then there is only one element in the (accessible) list. We It would be nice to define the "accessible" list somewhere explicitly - or drop that property because it seems to be the standard "elements within the current head and tail" anyway. // can't return that element, because it may be the old tail of a // concurrent push/append. So return NULL in this case. Otherwise, // attempt to cmpxchg that "next" value into the queue head, retrying // the whole operation if that fails. This is the "usual" lock-free pop // from head of slist, with the additional restriction on taking the slist? ;) // last element. s/taking/popping (or "get"-ing) the last element. // In order to address the ABA problem for pop, a pop operation protects // its access to the head of the list with a GlobalCounter critical // section. This works with the buffer allocator's use of GlobalCounter // synchronization to prevent ABA from arising in the normal buffer // usage cycle. The paused buffer handling prevents another ABA source // (see record_paused_buffer and enqueue_previous_paused_buffers). - g1DirtyCardQueue.cpp: s/"// Unreachable"/ShouldNotReachHere(); (or just delete) Thanks, Thomas From thomas.schatzl at oracle.com Wed Jan 22 16:14:10 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 22 Jan 2020 17:14:10 +0100 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: <0f272a73-d817-150e-ed68-687737445580@oracle.com> References: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> <0f272a73-d817-150e-ed68-687737445580@oracle.com> Message-ID: Hi Leo, On 22.01.20 17:02, Leo Korinth wrote: > On 22/01/2020 11:46, Thomas Schatzl wrote: >> Hi all, >> >> On 17.01.20 16:07, Leo Korinth wrote: >>> Hi Thomas, >>> >>> This is not a review. This code is basically the same code as is >> [...] >>> >>> How about using the TestG1ParallelPhases.java version for all three >>> test cases? If not, do the third version in TestOldGenCollectionUsage >>> really work??? >>> >> >> Here's a webrev incorporating these suggestions to unify the code: >> >> http://cr.openjdk.java.net/~tschatzl/8237079/webrev.1 (full) > >> >> There is no point to provide a diff webrev here as the whole change >> has been redone. >> [...] > > I think this looks *really* good. Also, thanks for taking the extra time > to make it shared and reusable code. > > /Leo Thanks for your review. Thomas From aph at redhat.com Wed Jan 22 17:45:20 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 22 Jan 2020 17:45:20 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> Message-ID: On 1/22/20 11:59 AM, David Holmes wrote: > I'm assuming the ordering requirement is to preserve the order as > expressed in the code. There is likely an assumption that by declaring > both as volatile that the the compiler will not reorder them; and that > the load_acquire will prevent the hardware from reordering them. I'm not > sure if either of those assumptions are actually valid. The compiler won't reorder the stores, but the hardware will. > But that doesn't explain the complete lack of barriers in set_empty. > > The GC folk will need to chime in on the detailed semantic requirements > of this algorithm. OK, but this looks like a separate problem: we can deal with it later if we need to. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Jan 22 17:47:35 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 22 Jan 2020 17:47:35 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <9dbfd063-ea45-10e0-b541-7e84d662581c@redhat.com> <88f97b92-df9e-140c-a972-44982ae3f79b@redhat.com> <23d3db9d-6603-c10c-8240-62cd82f4bae9@oracle.com> <12ca6024-bec5-8bd1-57d8-5a880fd5ad96@redhat.com> <6ef9347b-8139-0eb7-7150-8dce4b5e4dc6@oracle.com> Message-ID: <18cb433d-2694-3e2f-2a31-c850486d12c9@redhat.com> On 1/22/20 3:01 PM, Doerr, Martin wrote: > Thread1: set bottom (push) > Thread2: read age, read bottom, set age (pop_global) > Thread3: read age, read bottom (pop_global) > > The requirement is that Thread3 must never read an older bottom value than Thread2 after Thread3 has seen the age value from Thread2. OK, so all we need here is a LoadLoad between read age and read bottom in pop_global, as David Holmes said. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Jan 22 18:21:47 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 22 Jan 2020 19:21:47 +0100 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase In-Reply-To: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> References: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> Message-ID: On 1/17/20 5:34 PM, Zhengyu Gu wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ *) shenandoahHeap.cpp: does this change relate to this RFR? // When concurrent root is in progress, weak roots may contain dead oops, // they should not be used for root scanning. if (is_concurrent_root_in_progress()) { Otherwise looks okay. -- Thanks, -Aleksey From zgu at redhat.com Wed Jan 22 18:27:30 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 22 Jan 2020 13:27:30 -0500 Subject: [15] RFR 8236880: Shenandoah: Move string dedup cleanup into concurrent phase In-Reply-To: References: <837e7210-0bd7-e06f-907b-7c5fcc3c3684@redhat.com> Message-ID: <46f39b06-84f4-66aa-12a6-56904fb3c085@redhat.com> On 1/22/20 1:21 PM, Aleksey Shipilev wrote: > On 1/17/20 5:34 PM, Zhengyu Gu wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8236880 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8236880/webrev.00/ > > *) shenandoahHeap.cpp: does this change relate to this RFR? > > // When concurrent root is in progress, weak roots may contain dead oops, > // they should not be used for root scanning. > if (is_concurrent_root_in_progress()) { Yes. However, we may need to take another look due to JDK-8237632. Thanks, -Zhengyu > > Otherwise looks okay. > > From zgu at redhat.com Wed Jan 22 20:15:22 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 22 Jan 2020 15:15:22 -0500 Subject: [15] RFR 8234399: Shenandoah: Cleanup native load barrier Message-ID: <2933db7c-f29e-ddbb-3015-05430488a180@redhat.com> Please review this cleanup of a hack, which was added to workaround the problem manifested in JDK-8237396. With JDK-8237396 resolved, let's remove it. Bug: https://bugs.openjdk.java.net/browse/JDK-8234399 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234399/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From rkennke at redhat.com Wed Jan 22 20:45:13 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 22 Jan 2020 21:45:13 +0100 Subject: [15] RFR 8234399: Shenandoah: Cleanup native load barrier In-Reply-To: <2933db7c-f29e-ddbb-3015-05430488a180@redhat.com> References: <2933db7c-f29e-ddbb-3015-05430488a180@redhat.com> Message-ID: <6c972c66-7a31-74f5-2c8c-b7b67013334e@redhat.com> Ok. Thank you! Roman > Please review this cleanup of a hack, which was added to workaround the > problem manifested in JDK-8237396. > > With JDK-8237396 resolved, let's remove it. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234399 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234399/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > > Thanks, > > -Zhengyu > From kim.barrett at oracle.com Wed Jan 22 22:21:32 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 22 Jan 2020 17:21:32 -0500 Subject: [14] RFR (S): 8237079: gc/g1/mixedgc/TestLogging.java fails with "Pause Young (Mixed) (G1 Evacuation Pause) not found" In-Reply-To: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> References: <39d5a2fd-0a5d-97dd-3ecd-106dbdb8434f@oracle.com> Message-ID: <3D0FC9F2-6D20-48A5-902C-9615A84C5FBF@oracle.com> > On Jan 22, 2020, at 5:46 AM, Thomas Schatzl wrote: > > Hi all, > > On 17.01.20 16:07, Leo Korinth wrote: >> Hi Thomas, >> This is not a review. This code is basically the same code as is > > I took it as one anyway ;) > >> duplicated at least three times in the test code. One of the duplications you can blame me for, *sorry*. I believe it should be moved to a common library method. I also believe the last fix you did in TestG1ParallelPhases.java makes that version look cleaner than what you propose here (it does not need the last allocation loop at all). >> How about using the TestG1ParallelPhases.java version for all three test cases? If not, do the third version in TestOldGenCollectionUsage really work??? > > Here's a webrev incorporating these suggestions to unify the code: > > http://cr.openjdk.java.net/~tschatzl/8237079/webrev.1 (full) Looks good. From stefan.johansson at oracle.com Thu Jan 23 09:58:11 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 23 Jan 2020 10:58:11 +0100 Subject: RFR: 8237645: Remove OopsInGenClosure::par_do_barrier In-Reply-To: <34dcda2a-ef2b-65e0-b6f2-fae553d95983@oracle.com> References: <34dcda2a-ef2b-65e0-b6f2-fae553d95983@oracle.com> Message-ID: <2f87f787-8c31-c7d0-2e74-ac4e838cd32b@oracle.com> On 2020-01-22 15:02, Stefan Karlsson wrote: > Hi all, > > Please review this patch to some dead code after the CMS removal. > > https://cr.openjdk.java.net/~stefank/8237645/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8237645 Looks good, StefanJ > > Thanks, > StefanK From stefan.karlsson at oracle.com Thu Jan 23 10:01:23 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 23 Jan 2020 11:01:23 +0100 Subject: RFR: 8237645: Remove OopsInGenClosure::par_do_barrier In-Reply-To: <2f87f787-8c31-c7d0-2e74-ac4e838cd32b@oracle.com> References: <34dcda2a-ef2b-65e0-b6f2-fae553d95983@oracle.com> <2f87f787-8c31-c7d0-2e74-ac4e838cd32b@oracle.com> Message-ID: <0d1966b4-c4b7-3b2a-a9f0-fa6ffd7000c2@oracle.com> Thanks, Stefan. I'll push this with only one Reviewer. StefanK On 2020-01-23 10:58, Stefan Johansson wrote: > > On 2020-01-22 15:02, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to some dead code after the CMS removal. >> >> https://cr.openjdk.java.net/~stefank/8237645/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8237645 > Looks good, > StefanJ > >> >> Thanks, >> StefanK From per.liden at oracle.com Thu Jan 23 10:02:12 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 23 Jan 2020 11:02:12 +0100 Subject: RFR: 8237758: ZGC: Move get_mempolicy() syscall wrapper to ZSyscall Message-ID: <3496f019-6790-7c91-cf5d-62779274287b@oracle.com> System call wrappers should live in ZSyscall, but the wrapper for get_mempolicy() currently lives in ZNUMA. We should move it. Bug: https://bugs.openjdk.java.net/browse/JDK-8237758 Webrev: http://cr.openjdk.java.net/~pliden/8237758/webrev.0 /Per From per.liden at oracle.com Thu Jan 23 10:02:07 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 23 Jan 2020 11:02:07 +0100 Subject: RFR: 8237649: ZGC: Improved NUMA support when using small pages Message-ID: The NUMA allocation support in ZGC works as expected only when using -XX:+UseLargePages. The reason is that, on Linux, small pages are allocated at commit/fallocate time and is controlled by the NUMA policy of the current thread, while large pages are allocated at page fault time and is controlled by the NUMA policy of the memory range. ZGC currently only sets up the NUMA policy for the memory range, which has no effect on small pages (since they are allocated by tmpfs rather than being anonymous mappings). We should fix this, so that the NUMA allocation support works equally well for small pages. Bug: https://bugs.openjdk.java.net/browse/JDK-8237649 Webrev: http://cr.openjdk.java.net/~pliden/8237649/webrev.0 /Per From thomas.schatzl at oracle.com Thu Jan 23 10:24:48 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 23 Jan 2020 11:24:48 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <246ce191-0f61-1bd5-caec-71299bfcebef@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> <246ce191-0f61-1bd5-caec-71299bfcebef@oracle.com> Message-ID: Hi, On 22.01.20 14:16, Stefan Karlsson wrote: > > > On 2020-01-22 12:02, Thomas Schatzl wrote: >> Hi, >> >> On 17.01.20 14:31, Stefan Karlsson wrote: >>> Hi all, >>> >>> Please review this patch to remove the automatic "is in heap" >>> verification from OopIterateClosure. >>> >>> https://cr.openjdk.java.net/~stefank/8237363/webrev.01/ >>> https://bugs.openjdk.java.net/browse/JDK-8237363 >>> >>> OopIterateClosure provides some automatic verification that loaded >>> objects are inside the heap. Closures can opt out from this by >>> overriding should_verify_oops(). >>> >>> I propose that we move this verification, and the way to turn it off, >>> and instead let the implementations of the closures decide the kind >>> of verification that is appropriate. I want to do this to de-clutter >>> the closure APIs a bit. >>> >> >> While the change is correct, I am not really convinced it is a good >> idea to trade verification in one place to the same verification in >> many place. > > An alternative would be to simply remove the verification altogether. As > I said, we almost always check the result of the object address. > Emphasis on "almost". >> >> The closure API does not seem to be particularly "cluttered up" by >> this particular API to me. > > It's a slippery slope. Previously, we had a lot of GC specific functions > in these interfaces. I've been cleaning this over the years, and this is > one of the last non-essential parts of that interface that implementors > need to consider. > > With my removal people don't have to think about this anymore. But with the change people have to think about making sure to do the verification manually. This does not seem an improvement at all. > > ?It is true that other code typically has many >> other asserts that would fail anyway, but it would be an additional >> safety net when writing new closures. > > It's a safety net that works for G1, but almost always is incorrectly > trips in the assert with ZGC. > It works for all GCs (+leak profiler) but ZGC given the webrev. This does not suggest that this is GC-specific functionality at all. The verification method also seems to only uses an innocuous CollectedHeap::is_in() call that seems something very basic to support for a GC. What is it in ZGC that prevents CollectedHeap::is_in() to return the expected value? And the opt-out method does have been designed for unusual cases. Thanks, Thomas From per.liden at oracle.com Thu Jan 23 13:01:38 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 23 Jan 2020 14:01:38 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> <246ce191-0f61-1bd5-caec-71299bfcebef@oracle.com> Message-ID: <4667326e-bceb-6db9-e6c1-f078c90a2e2f@oracle.com> Hi, On 1/23/20 11:24 AM, Thomas Schatzl wrote: [...] >> ??It is true that other code typically has many >>> other asserts that would fail anyway, but it would be an additional >>> safety net when writing new closures. >> >> It's a safety net that works for G1, but almost always is incorrectly >> trips in the assert with ZGC. >> > > It works for all GCs (+leak profiler) but ZGC given the webrev. This > does not suggest that this is GC-specific functionality at all. The > verification method also seems to only uses an innocuous > CollectedHeap::is_in() call that seems something very basic to support > for a GC. > > What is it in ZGC that prevents CollectedHeap::is_in() to return the > expected value? ZGC is returning the expected value. The problem here is that the verification happens _before_ the closure is applied, i.e. it asks if an oop that has not yet been fixed points into the heap. ZGC's is_in() is precise (i.e. cares about which heap view an oop points into), so an oop with a bad color is not considered to point into the heap. It's a feature, as it allows for exact verification and will catch oops with bad colors. cheers, Per > > And the opt-out method does have been designed for unusual cases. > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Thu Jan 23 13:33:05 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 23 Jan 2020 14:33:05 +0100 Subject: RFR: 8237363: Remove automatic is in heap verification in OopIterateClosure In-Reply-To: <4667326e-bceb-6db9-e6c1-f078c90a2e2f@oracle.com> References: <1cb1e7ea-45dd-6a36-1731-94fe1fe25244@oracle.com> <20587626-5998-d756-5c9b-893ce42f40cd@oracle.com> <246ce191-0f61-1bd5-caec-71299bfcebef@oracle.com> <4667326e-bceb-6db9-e6c1-f078c90a2e2f@oracle.com> Message-ID: <97657cf1-435b-e4cf-8653-7fa29d89c310@oracle.com> Hi, On 23.01.20 14:01, Per Liden wrote: > Hi, > > On 1/23/20 11:24 AM, Thomas Schatzl wrote: > [...] >>> ??It is true that other code typically has many >>>> other asserts that would fail anyway, but it would be an additional >>>> safety net when writing new closures. >>> >>> It's a safety net that works for G1, but almost always is incorrectly >>> trips in the assert with ZGC. >>> >> >> It works for all GCs (+leak profiler) but ZGC given the webrev. This >> does not suggest that this is GC-specific functionality at all. The >> verification method also seems to only uses an innocuous >> CollectedHeap::is_in() call that seems something very basic to support >> for a GC. >> >> What is it in ZGC that prevents CollectedHeap::is_in() to return the >> expected value? > > ZGC is returning the expected value. The problem here is that the > verification happens _before_ the closure is applied, i.e. it asks if an > oop that has not yet been fixed points into the heap. ZGC's is_in() is > precise (i.e. cares about which heap view an oop points into), so an oop > with a bad color is not considered to point into the heap. It's a > feature, as it allows for exact verification and will catch oops with > bad colors. thanks. After some internal discussions: looks good. Thomas From thomas.schatzl at oracle.com Thu Jan 23 16:06:29 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 23 Jan 2020 17:06:29 +0100 Subject: RFR: 8233822: VM_G1CollectForAllocation should always check for upgrade to full In-Reply-To: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> References: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> Message-ID: <21fc0e88-719a-5779-6855-1830b4bf325a@oracle.com> Hi Kim, On 21.01.20 09:31, Kim Barrett wrote: > Please review this G1 change to always check whether a full collection > should be performed after a non-full collection pause, e.g. the > collection needs to be "upgraded" to a full collection. There are > various conditions which can lead to needing to do that, and as the CR > suggests, we need to be consistent about checking for and performing > such an upgrade. > > This is accomplished by moving most of do_collection_pause_at_safepoint > into a helper function and changing that existing function to call the > helper, then check for and, if needed, perform a needed upgrade to a > full collection. Callers of that function are updated to remove > explict conditional upgrading, where present. This also addresses the > surprisingly placed call in a G1-specific block of code in gc/shared > (see also JDK-8237567). > > CR: > https://bugs.openjdk.java.net/browse/JDK-8233822 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8233822/open.00/ > > Testing: > mach5 tier1-5 > Locally (linux-x64) ran modified InfiniteList.java test (allocate > small rather than arrays) and verified some upgrades occurred as > expected. Minor nit you can ignore: in g1VMOperations.cpp:129 I would have probably folded the two if's into a single one. Looks good with and without any change in this area. Thanks for fixing this, Thomas From kim.barrett at oracle.com Thu Jan 23 19:25:58 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 23 Jan 2020 14:25:58 -0500 Subject: RFR: 8233822: VM_G1CollectForAllocation should always check for upgrade to full In-Reply-To: <21fc0e88-719a-5779-6855-1830b4bf325a@oracle.com> References: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> <21fc0e88-719a-5779-6855-1830b4bf325a@oracle.com> Message-ID: <472779BA-E6A4-4231-8B62-9BFC381DC59D@oracle.com> > On Jan 23, 2020, at 11:06 AM, Thomas Schatzl wrote: > > Hi Kim, > > On 21.01.20 09:31, Kim Barrett wrote: >> Please review this G1 change to always check whether a full collection >> should be performed after a non-full collection pause, e.g. the >> collection needs to be "upgraded" to a full collection. There are >> various conditions which can lead to needing to do that, and as the CR >> suggests, we need to be consistent about checking for and performing >> such an upgrade. >> This is accomplished by moving most of do_collection_pause_at_safepoint >> into a helper function and changing that existing function to call the >> helper, then check for and, if needed, perform a needed upgrade to a >> full collection. Callers of that function are updated to remove >> explict conditional upgrading, where present. This also addresses the >> surprisingly placed call in a G1-specific block of code in gc/shared >> (see also JDK-8237567). >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8233822 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8233822/open.00/ >> Testing: >> mach5 tier1-5 >> Locally (linux-x64) ran modified InfiniteList.java test (allocate >> small rather than arrays) and verified some upgrades occurred as >> expected. > > Minor nit you can ignore: in g1VMOperations.cpp:129 I would have probably folded the two if's into a single one. Sure, I?ll do that. > Looks good with and without any change in this area. > > Thanks for fixing this, > Thomas Thanks. From kim.barrett at oracle.com Thu Jan 23 20:10:48 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 23 Jan 2020 15:10:48 -0500 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> Message-ID: <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> > On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: > On 16.01.20 09:51, Kim Barrett wrote: >> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >> is one of the two remaining super-special "access" ranked mutexes. >> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >> by JDK-8221360.) >> There are three main parts to this change. >> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >> lock-free FIFO queue. >> (2) Replace the use of a HotSpot monitor for signaling activation of >> concurrent refinement threads with a semaphore-based solution. >> (3) Handle pausing of buffer refinement in the middle of a buffer in >> order to handle a pending safepoint request. This can no longer just >> push the partially processed buffer back onto the queue, due to ABA >> problems now that the buffer is lock-free. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8237143 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >> Testing: >> mach5 tier1-5 >> Normal performance testing showed no significant change. >> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >> improvement, though not statistically significant; removing contention >> for that lock by many hardware threads may be a little bit noticeable. > > initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. After some offline discussion with Thomas, I?m doing some restructuring that makes it probably not very efficient for anyone else to do a careful review of the open.00 version. From stuart.monteith at linaro.org Fri Jan 24 09:52:04 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Fri, 24 Jan 2020 09:52:04 +0000 Subject: RFR: 8237649: ZGC: Improved NUMA support when using small pages In-Reply-To: References: Message-ID: Hello Per, I notice you've left "UseNewCode" in size_t ZPhysicalMemoryBacking::commit(size_t offset, size_t length). I presume this is accidental. The rest looks ok to me. Thanks, Stuart On Thu, 23 Jan 2020 at 10:02, Per Liden wrote: > > The NUMA allocation support in ZGC works as expected only when using > -XX:+UseLargePages. The reason is that, on Linux, small pages are > allocated at commit/fallocate time and is controlled by the NUMA policy > of the current thread, while large pages are allocated at page fault > time and is controlled by the NUMA policy of the memory range. ZGC > currently only sets up the NUMA policy for the memory range, which has > no effect on small pages (since they are allocated by tmpfs rather than > being anonymous mappings). > > We should fix this, so that the NUMA allocation support works equally > well for small pages. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237649 > Webrev: http://cr.openjdk.java.net/~pliden/8237649/webrev.0 > > /Per From per.liden at oracle.com Fri Jan 24 10:25:59 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 24 Jan 2020 11:25:59 +0100 Subject: RFR: 8237649: ZGC: Improved NUMA support when using small pages In-Reply-To: References: Message-ID: <3ad95202-b82c-9894-5408-4bbdac111827@oracle.com> Hi Stuart, On 1/24/20 10:52 AM, Stuart Monteith wrote: > Hello Per, > I notice you've left "UseNewCode" in size_t > ZPhysicalMemoryBacking::commit(size_t offset, size_t length). I > presume this is accidental. The rest looks ok to me. Oops... good catch! I'll remove that. Thanks for reviewing! cheers, Per > > Thanks, > Stuart > > On Thu, 23 Jan 2020 at 10:02, Per Liden wrote: >> >> The NUMA allocation support in ZGC works as expected only when using >> -XX:+UseLargePages. The reason is that, on Linux, small pages are >> allocated at commit/fallocate time and is controlled by the NUMA policy >> of the current thread, while large pages are allocated at page fault >> time and is controlled by the NUMA policy of the memory range. ZGC >> currently only sets up the NUMA policy for the memory range, which has >> no effect on small pages (since they are allocated by tmpfs rather than >> being anonymous mappings). >> >> We should fix this, so that the NUMA allocation support works equally >> well for small pages. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237649 >> Webrev: http://cr.openjdk.java.net/~pliden/8237649/webrev.0 >> >> /Per From stefan.karlsson at oracle.com Fri Jan 24 14:29:37 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 24 Jan 2020 15:29:37 +0100 Subject: RFR: 8237821: 8237637 Broke Shenandoah Message-ID: Hi all, I broke Shenandoah release builds with 8237821. During the review I was asked to do some simplification and get rid of redundant casts. I did the same for Shenandoah, but the patch wasn't complete. So, to fix the current build I've reverted all the Shenandoah changes from 8237821: https://cr.openjdk.java.net/~stefank/8237821/webrev.01.revert/ Then I only do the simple change of casting with (HeapWord*) to using cast_from_oop: https://cr.openjdk.java.net/~stefank/8237821/webrev.01.fix/ Both these combined gives this final patch: https://cr.openjdk.java.net/~stefank/8237821/webrev.01.combined/ I've compiled Shenandoah with both fastdebug and release builds, and am currently testing this locally. Thanks, StefanK From shade at redhat.com Fri Jan 24 14:35:45 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 24 Jan 2020 15:35:45 +0100 Subject: RFR: 8237821: 8237637 Broke Shenandoah In-Reply-To: References: Message-ID: On 1/24/20 3:29 PM, Stefan Karlsson wrote: > Both these combined gives this final patch: > https://cr.openjdk.java.net/~stefank/8237821/webrev.01.combined/ Unfortunately, those casts end up being rather ugly. Please let Shenandoah people to fix the code to make it both clear and correct. -- Thanks, -Aleksey From stefan.karlsson at oracle.com Fri Jan 24 14:37:26 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 24 Jan 2020 15:37:26 +0100 Subject: RFR: 8237821: 8237637 Broke Shenandoah In-Reply-To: References: Message-ID: <4d720013-2e06-cf7f-d941-4e41cf83934b@oracle.com> On 2020-01-24 15:35, Aleksey Shipilev wrote: > On 1/24/20 3:29 PM, Stefan Karlsson wrote: >> Both these combined gives this final patch: >> https://cr.openjdk.java.net/~stefank/8237821/webrev.01.combined/ > > Unfortunately, those casts end up being rather ugly. > Please let Shenandoah people to fix the code to make it both clear and correct. Sounds good to me. StefanK > From per.liden at oracle.com Fri Jan 24 14:49:32 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 24 Jan 2020 15:49:32 +0100 Subject: RFR: 8237825: ZGC: Replace -XX:ZPath with -XX:AllocateHeapAt Message-ID: <4142fa91-5b9c-2880-7781-7774f92e56a1@oracle.com> ZGC has the option -XX:ZPath to allow a user to explicitly specify where the backing file system is located. However, after ZPath was introduced, a new generic option -XX:AllocateHeapAt was introduced. This option is used the other GCs and have the same meaning/purpose and it's arguably better named. There's no good reason why ZGC shouldn't use that too, instead of -XX:ZPath. Bug: https://bugs.openjdk.java.net/browse/JDK-8237825 Webrev: http://cr.openjdk.java.net/~pliden/8237825/webrev.0 /Per From erik.osterlund at oracle.com Fri Jan 24 14:58:01 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 24 Jan 2020 15:58:01 +0100 Subject: RFR: 8237649: ZGC: Improved NUMA support when using small pages In-Reply-To: References: Message-ID: Hi Per, Looks good. Thanks, /Erik On 1/23/20 11:02 AM, Per Liden wrote: > The NUMA allocation support in ZGC works as expected only when using > -XX:+UseLargePages. The reason is that, on Linux, small pages are > allocated at commit/fallocate time and is controlled by the NUMA > policy of the current thread, while large pages are allocated at page > fault time and is controlled by the NUMA policy of the memory range. > ZGC currently only sets up the NUMA policy for the memory range, which > has no effect on small pages (since they are allocated by tmpfs rather > than being anonymous mappings). > > We should fix this, so that the NUMA allocation support works equally > well for small pages. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237649 > Webrev: http://cr.openjdk.java.net/~pliden/8237649/webrev.0 > > /Per From erik.osterlund at oracle.com Fri Jan 24 15:01:44 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 24 Jan 2020 16:01:44 +0100 Subject: RFR: 8237825: ZGC: Replace -XX:ZPath with -XX:AllocateHeapAt In-Reply-To: <4142fa91-5b9c-2880-7781-7774f92e56a1@oracle.com> References: <4142fa91-5b9c-2880-7781-7774f92e56a1@oracle.com> Message-ID: <9571eb79-286d-3035-eaf3-a15dab48b52d@oracle.com> Hi Per, Looks good! /Erik On 1/24/20 3:49 PM, Per Liden wrote: > ZGC has the option -XX:ZPath to allow a user to explicitly specify > where the backing file system is located. However, after ZPath was > introduced, a new generic option -XX:AllocateHeapAt was introduced. > This option is used the other GCs and have the same meaning/purpose > and it's arguably better named. There's no good reason why ZGC > shouldn't use that too, instead of -XX:ZPath. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237825 > Webrev: http://cr.openjdk.java.net/~pliden/8237825/webrev.0 > > /Per From per.liden at oracle.com Fri Jan 24 15:03:55 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 24 Jan 2020 16:03:55 +0100 Subject: RFR: 8237649: ZGC: Improved NUMA support when using small pages In-Reply-To: References: Message-ID: Thanks Erik! /Per On 1/24/20 3:58 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 1/23/20 11:02 AM, Per Liden wrote: >> The NUMA allocation support in ZGC works as expected only when using >> -XX:+UseLargePages. The reason is that, on Linux, small pages are >> allocated at commit/fallocate time and is controlled by the NUMA >> policy of the current thread, while large pages are allocated at page >> fault time and is controlled by the NUMA policy of the memory range. >> ZGC currently only sets up the NUMA policy for the memory range, which >> has no effect on small pages (since they are allocated by tmpfs rather >> than being anonymous mappings). >> >> We should fix this, so that the NUMA allocation support works equally >> well for small pages. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237649 >> Webrev: http://cr.openjdk.java.net/~pliden/8237649/webrev.0 >> >> /Per > From erik.osterlund at oracle.com Fri Jan 24 15:04:08 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Fri, 24 Jan 2020 16:04:08 +0100 Subject: RFR: 8237758: ZGC: Move get_mempolicy() syscall wrapper to ZSyscall In-Reply-To: <3496f019-6790-7c91-cf5d-62779274287b@oracle.com> References: <3496f019-6790-7c91-cf5d-62779274287b@oracle.com> Message-ID: Hi Per, Looks good! /Erik On 1/23/20 11:02 AM, Per Liden wrote: > System call wrappers should live in ZSyscall, but the wrapper for > get_mempolicy() currently lives in ZNUMA. We should move it. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237758 > Webrev: http://cr.openjdk.java.net/~pliden/8237758/webrev.0 > > /Per From per.liden at oracle.com Fri Jan 24 15:04:32 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 24 Jan 2020 16:04:32 +0100 Subject: RFR: 8237758: ZGC: Move get_mempolicy() syscall wrapper to ZSyscall In-Reply-To: References: <3496f019-6790-7c91-cf5d-62779274287b@oracle.com> Message-ID: Thanks Erik! /Per On 1/24/20 4:04 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good! > > /Erik > > On 1/23/20 11:02 AM, Per Liden wrote: >> System call wrappers should live in ZSyscall, but the wrapper for >> get_mempolicy() currently lives in ZNUMA. We should move it. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237758 >> Webrev: http://cr.openjdk.java.net/~pliden/8237758/webrev.0 >> >> /Per > From per.liden at oracle.com Fri Jan 24 15:04:14 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 24 Jan 2020 16:04:14 +0100 Subject: RFR: 8237825: ZGC: Replace -XX:ZPath with -XX:AllocateHeapAt In-Reply-To: <9571eb79-286d-3035-eaf3-a15dab48b52d@oracle.com> References: <4142fa91-5b9c-2880-7781-7774f92e56a1@oracle.com> <9571eb79-286d-3035-eaf3-a15dab48b52d@oracle.com> Message-ID: Thanks Erik! /Per On 1/24/20 4:01 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good! > > /Erik > > On 1/24/20 3:49 PM, Per Liden wrote: >> ZGC has the option -XX:ZPath to allow a user to explicitly specify >> where the backing file system is located. However, after ZPath was >> introduced, a new generic option -XX:AllocateHeapAt was introduced. >> This option is used the other GCs and have the same meaning/purpose >> and it's arguably better named. There's no good reason why ZGC >> shouldn't use that too, instead of -XX:ZPath. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237825 >> Webrev: http://cr.openjdk.java.net/~pliden/8237825/webrev.0 >> >> /Per > From shade at redhat.com Fri Jan 24 17:51:13 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 24 Jan 2020 18:51:13 +0100 Subject: RFR (S) 8237821: Shenandoah: build broken after JDK-8237637 (Remove dubious type conversions from oop) In-Reply-To: <1b170e9c-8a4d-984d-e03b-c6bccf6dc765@redhat.com> References: <1b170e9c-8a4d-984d-e03b-c6bccf6dc765@redhat.com> Message-ID: (should have copied hotspot-gc-dev@ as well) On 1/24/20 6:50 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8237821 > > Current release build in jdk/jdk is broken because of this. Instead of reverting the 8237637, let's > complete the Shenandoah parts. > > I believe it is in the spirit of the original patch to be explicit about oop/HeapWord*/void* in > Shenandoah. cset->is_in(oop) should be doing the proper cast_from_oop dance that would go through > CHECK_UNHANDLED_OOPS and friends. So we need an overload to handle potentially interior pointers > (HeapWord*/void*). And it also removes some of the template mess we have, that lead to this failure. > > Fix: > https://cr.openjdk.java.net/~shade/8237821/webrev.01 > > Testing: Linux x86_64 {release,fastdebug,slowdebug} hotspot_gc_shenandoah > -- Thanks, -Aleksey From per.liden at oracle.com Mon Jan 27 13:54:21 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 27 Jan 2020 14:54:21 +0100 Subject: RFR: 8237884: ZGC: Use clamp() instead of MIN2(MAX2()) Message-ID: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> JDK-8233702 introduced clamp(), but ZHeuristics still uses MIN2(MAX2()) in one place. Bug: https://bugs.openjdk.java.net/browse/JDK-8237884 Webrev: http://cr.openjdk.java.net/~pliden/8237884/webrev.0 /Per From per.liden at oracle.com Mon Jan 27 13:54:16 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 27 Jan 2020 14:54:16 +0100 Subject: RFR: 8237882: ZGC: Removed ZUtils::round_{up,down}_power_of_2() declarations Message-ID: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> JDK-8234331 removed ZUtils::round_{up,down}_power_of_2() but left the function declarations in the ZUtils class. Bug: https://bugs.openjdk.java.net/browse/JDK-8237882 Webrev: http://cr.openjdk.java.net/~pliden/8237882/webrev.0 /Per From stuart.monteith at linaro.org Mon Jan 27 15:24:08 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Mon, 27 Jan 2020 15:24:08 +0000 Subject: RFR: 8237884: ZGC: Use clamp() instead of MIN2(MAX2()) In-Reply-To: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> References: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> Message-ID: That looks trivially Ok to me. (Not a reviewer). On Mon, 27 Jan 2020 at 13:55, Per Liden wrote: > > JDK-8233702 introduced clamp(), but ZHeuristics still uses MIN2(MAX2()) > in one place. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237884 > Webrev: http://cr.openjdk.java.net/~pliden/8237884/webrev.0 > > /Per From per.liden at oracle.com Mon Jan 27 15:25:54 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 27 Jan 2020 16:25:54 +0100 Subject: RFR: 8237884: ZGC: Use clamp() instead of MIN2(MAX2()) In-Reply-To: References: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> Message-ID: <5e5f4be6-5ff9-5d05-5e33-c8dc0a1d18f2@oracle.com> Thanks for reviewing, Stuart. cheers, Per On 1/27/20 4:24 PM, Stuart Monteith wrote: > That looks trivially Ok to me. (Not a reviewer). > > On Mon, 27 Jan 2020 at 13:55, Per Liden wrote: >> >> JDK-8233702 introduced clamp(), but ZHeuristics still uses MIN2(MAX2()) >> in one place. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237884 >> Webrev: http://cr.openjdk.java.net/~pliden/8237884/webrev.0 >> >> /Per From stuart.monteith at linaro.org Mon Jan 27 15:30:51 2020 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Mon, 27 Jan 2020 15:30:51 +0000 Subject: RFR: 8237882: ZGC: Removed ZUtils::round_{up, down}_power_of_2() declarations In-Reply-To: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> References: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> Message-ID: This looks OK (not a reviewer). On Mon, 27 Jan 2020 at 13:57, Per Liden wrote: > > JDK-8234331 removed ZUtils::round_{up,down}_power_of_2() but left the > function declarations in the ZUtils class. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237882 > Webrev: http://cr.openjdk.java.net/~pliden/8237882/webrev.0 > > /Per From per.liden at oracle.com Mon Jan 27 15:42:19 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 27 Jan 2020 16:42:19 +0100 Subject: RFR: 8237882: ZGC: Removed ZUtils::round_{up,down}_power_of_2() declarations In-Reply-To: References: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> Message-ID: Thanks Stuart! /Per On 1/27/20 4:30 PM, Stuart Monteith wrote: > This looks OK (not a reviewer). > > On Mon, 27 Jan 2020 at 13:57, Per Liden wrote: >> >> JDK-8234331 removed ZUtils::round_{up,down}_power_of_2() but left the >> function declarations in the ZUtils class. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237882 >> Webrev: http://cr.openjdk.java.net/~pliden/8237882/webrev.0 >> >> /Per From zgu at redhat.com Mon Jan 27 16:25:15 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 27 Jan 2020 11:25:15 -0500 Subject: [15] RFR(T) 8237874: Shenandoah: Backout JDK-8234399 Message-ID: <78491ea8-d877-04e8-b265-74f8a30997e0@redhat.com> I would like to backout JDK-8234399, as Shenandoah still triggers barriers on GC paths. Bug: https://bugs.openjdk.java.net/browse/JDK-8237874 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237874/webrev/ Test: hotspot_gc_shenandoah and test case in bug report. Thanks, -Zhengyu From shade at redhat.com Mon Jan 27 16:28:49 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 27 Jan 2020 17:28:49 +0100 Subject: [15] RFR(T) 8237874: Shenandoah: Backout JDK-8234399 In-Reply-To: <78491ea8-d877-04e8-b265-74f8a30997e0@redhat.com> References: <78491ea8-d877-04e8-b265-74f8a30997e0@redhat.com> Message-ID: On 1/27/20 5:25 PM, Zhengyu Gu wrote: > I would like to backout JDK-8234399, as Shenandoah still triggers > barriers on GC paths. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237874 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237874/webrev/ Yes, please. -- Thanks, -Aleksey From kim.barrett at oracle.com Mon Jan 27 17:36:39 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 27 Jan 2020 12:36:39 -0500 Subject: RFR: 8237884: ZGC: Use clamp() instead of MIN2(MAX2()) In-Reply-To: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> References: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> Message-ID: <08A78C8C-782C-4D9F-92B5-E7C6BD909212@oracle.com> > On Jan 27, 2020, at 8:54 AM, Per Liden wrote: > > JDK-8233702 introduced clamp(), but ZHeuristics still uses MIN2(MAX2()) in one place. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237884 > Webrev: http://cr.openjdk.java.net/~pliden/8237884/webrev.0 > > /Per Looks good. From kim.barrett at oracle.com Mon Jan 27 17:37:58 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 27 Jan 2020 12:37:58 -0500 Subject: RFR: 8237882: ZGC: Removed ZUtils::round_{up,down}_power_of_2() declarations In-Reply-To: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> References: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> Message-ID: <4F32DC9F-DF27-4247-94A5-B5315235279C@oracle.com> > On Jan 27, 2020, at 8:54 AM, Per Liden wrote: > > JDK-8234331 removed ZUtils::round_{up,down}_power_of_2() but left the function declarations in the ZUtils class. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237882 > Webrev: http://cr.openjdk.java.net/~pliden/8237882/webrev.0 > > /Per Looks good. From manc at google.com Tue Jan 28 03:13:02 2020 From: manc at google.com (Man Cao) Date: Mon, 27 Jan 2020 19:13:02 -0800 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp Message-ID: Hi all, Could anyone review this small patch contributed by my colleague Ian Rogers (irogers at google.com)? Bug: https://bugs.openjdk.java.net/browse/JDK-8234608 Webrev: https://cr.openjdk.java.net/~manc/8234608/webrev.00/ -Man From per.liden at oracle.com Tue Jan 28 07:10:15 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 28 Jan 2020 08:10:15 +0100 Subject: RFR: 8237882: ZGC: Removed ZUtils::round_{up,down}_power_of_2() declarations In-Reply-To: <4F32DC9F-DF27-4247-94A5-B5315235279C@oracle.com> References: <95acba6b-dcda-6f4d-c99b-bff42b45e68e@oracle.com> <4F32DC9F-DF27-4247-94A5-B5315235279C@oracle.com> Message-ID: <430790ac-227d-88ef-ee8d-7acdfea5df06@oracle.com> Thanks Kim! /Per On 1/27/20 6:37 PM, Kim Barrett wrote: >> On Jan 27, 2020, at 8:54 AM, Per Liden wrote: >> >> JDK-8234331 removed ZUtils::round_{up,down}_power_of_2() but left the function declarations in the ZUtils class. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237882 >> Webrev: http://cr.openjdk.java.net/~pliden/8237882/webrev.0 >> >> /Per > > Looks good. > From per.liden at oracle.com Tue Jan 28 07:10:28 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 28 Jan 2020 08:10:28 +0100 Subject: RFR: 8237884: ZGC: Use clamp() instead of MIN2(MAX2()) In-Reply-To: <08A78C8C-782C-4D9F-92B5-E7C6BD909212@oracle.com> References: <8df285ce-3a79-30d5-88b2-f9ebfe213abe@oracle.com> <08A78C8C-782C-4D9F-92B5-E7C6BD909212@oracle.com> Message-ID: <3001ce1d-213e-3e31-efec-91ace42dc572@oracle.com> Thanks Kim! /Per On 1/27/20 6:36 PM, Kim Barrett wrote: >> On Jan 27, 2020, at 8:54 AM, Per Liden wrote: >> >> JDK-8233702 introduced clamp(), but ZHeuristics still uses MIN2(MAX2()) in one place. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237884 >> Webrev: http://cr.openjdk.java.net/~pliden/8237884/webrev.0 >> >> /Per > > Looks good. > From ivan.walulya at oracle.com Tue Jan 28 08:05:13 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 28 Jan 2020 09:05:13 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum Message-ID: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Hi all, Please review the removal of unused Enum from the parallel GC. Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ Testing: Tier 1 - Tier 3 //Ivan From leo.korinth at oracle.com Tue Jan 28 08:10:47 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Tue, 28 Jan 2020 09:10:47 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Message-ID: Hi! On 28/01/2020 09:05, Ivan Walulya wrote: > Hi all, > > Please review the removal of unused Enum from the parallel GC. Thanks for fixing this! > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ Looks good Ivan. I will sponsor this change for you. Thanks, Leo > Testing: Tier 1 - Tier 3 > > //Ivan > From ivan.walulya at oracle.com Tue Jan 28 08:16:52 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 28 Jan 2020 09:16:52 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Message-ID: Thanks Leo! //Ivan > On 28 Jan 2020, at 09:10, Leo Korinth wrote: > > Hi! > > On 28/01/2020 09:05, Ivan Walulya wrote: >> Hi all, >> Please review the removal of unused Enum from the parallel GC. > > Thanks for fixing this! > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ > > Looks good Ivan. I will sponsor this change for you. > > Thanks, > Leo > > >> Testing: Tier 1 - Tier 3 >> //Ivan From kim.barrett at oracle.com Tue Jan 28 08:46:01 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 28 Jan 2020 03:46:01 -0500 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Message-ID: <256CF4B8-6742-497C-A5B8-E0D942465B14@oracle.com> > On Jan 28, 2020, at 3:05 AM, Ivan Walulya wrote: > > Hi all, > > Please review the removal of unused Enum from the parallel GC. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ > > Testing: Tier 1 - Tier 3 > > //Ivan Looks good. From ivan.walulya at oracle.com Tue Jan 28 08:50:54 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 28 Jan 2020 09:50:54 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: <256CF4B8-6742-497C-A5B8-E0D942465B14@oracle.com> References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> <256CF4B8-6742-497C-A5B8-E0D942465B14@oracle.com> Message-ID: Thanks Kim! //Ivan > On 28 Jan 2020, at 09:46, Kim Barrett wrote: > >> On Jan 28, 2020, at 3:05 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review the removal of unused Enum from the parallel GC. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ >> >> Testing: Tier 1 - Tier 3 >> >> //Ivan > > Looks good. > From thomas.schatzl at oracle.com Tue Jan 28 09:22:13 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 28 Jan 2020 10:22:13 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Message-ID: Hi, On 28.01.20 09:05, Ivan Walulya wrote: > Hi all, > > Please review the removal of unused Enum from the parallel GC. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ > > Testing: Tier 1 - Tier 3 > > //Ivan > a bit late, but looks good :) Thomas From ivan.walulya at oracle.com Tue Jan 28 09:44:15 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 28 Jan 2020 10:44:15 +0100 Subject: RFR: 8232689: Remove ParCompactionManager::Action enum In-Reply-To: References: <2AE55972-F775-471A-8F1A-86E552A8788D@oracle.com> Message-ID: <85DBDB9C-5F6F-4DF9-955D-DA476B33521A@oracle.com> Thanks Thomas //Ivan > On 28 Jan 2020, at 10:22, Thomas Schatzl wrote: > > Hi, > > On 28.01.20 09:05, Ivan Walulya wrote: >> Hi all, >> Please review the removal of unused Enum from the parallel GC. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232689 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8232689/ >> Testing: Tier 1 - Tier 3 >> //Ivan > > a bit late, but looks good :) > > Thomas From thomas.schatzl at oracle.com Tue Jan 28 14:23:52 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 28 Jan 2020 15:23:52 +0100 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: References: Message-ID: Hi Ian/Man, On 28.01.20 04:13, Man Cao wrote: > Hi all, > > Could anyone review this small patch contributed by my colleague Ian Rogers > (irogers at google.com)? > Bug: https://bugs.openjdk.java.net/browse/JDK-8234608 > Webrev: https://cr.openjdk.java.net/~manc/8234608/webrev.00/ > > -Man > the change looks good - but the test testing this is broken. In fact, the tests doing the redefinition via JVMTI do not even run at all. This may be why the error that has been "fixed" in this change about passing the byte stream to the RedefineClass method has never been noticed before. I did some hacking to enable redefinition in the tests, and then immediately had to fix the JNI method name, which once more indicates that the appropriate tests were never run... :( My changes are available at http://cr.openjdk.java.net/~tschatzl/8234608/webrev/ ; however the test(s) fail at the RedefineClasses call (with and without your patch) with [...]vmTestbase/gc/g1/unloading/libdefine.cpp: Failed to call RedefineClasses(): the function returned error 60 For more info about this error see the JVMTI spec which means JVMTI_ERROR_INVALID_CLASS_FORMAT (60) A new class file is malformed (the VM would return a ClassFormatError). I have no further clue about what's wrong here. Maybe you are interested/have time to investigate more but I need to give up for today at least. Otherwise it's probably best to just add some links to the CR for somebody else to continue. Thanks, Thomas From zgu at redhat.com Tue Jan 28 15:34:49 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 28 Jan 2020 10:34:49 -0500 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() Message-ID: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> Please review this patch that uses single-threaded string dedup's oops_do() implementation for heap iteration. The bug was reported by SAP on Windows, but it is not Windows specific. The bug is due to heap iteration uses parallel version of string dedup's oops_do() implementation, which can interfere concurrent string dedup cleaning task. Bug: https://bugs.openjdk.java.net/browse/JDK-8237963 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html Test: gc/shenandoah/jvmti/TestHeapDump.java test with -XX:+UseStringDeduplication (fastdebug and release) on x86_64 Linux. Thanks, -Zhengyu From shade at redhat.com Tue Jan 28 17:54:25 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 28 Jan 2020 18:54:25 +0100 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() In-Reply-To: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> References: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> Message-ID: <889edf0c-db71-24e1-a3dc-7bc5ea433541@redhat.com> On 1/28/20 4:34 PM, Zhengyu Gu wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8237963 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html Looks fine! -- Thanks, -Aleksey From zgu at redhat.com Tue Jan 28 19:03:34 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 28 Jan 2020 14:03:34 -0500 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() In-Reply-To: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> References: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> Message-ID: <2de8acb4-bb48-ae91-47c5-b5a3adfa301c@redhat.com> Sorry, the early fix is incorrect. Concurrent string dedup cleaning task may change string dedup table/queue structures, that makes it unsafe for heap iteration to walk them concurently. Instead, heap iteration should use concurrent version, to block out each other. Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html Also changed bug synopsis to: Shenandoah: Heap iteration should use concurrent version of string dedup roots Test: gc/shenandoah/jvmti/TestHeapDump.java test with -XX:+UseStringDeduplication (fastdebug and release) on x86_64 Linux. in loop with 10 iterations. Thanks, -Zhengyu On 1/28/20 10:34 AM, Zhengyu Gu wrote: > Please review this patch that uses single-threaded string dedup's > oops_do() implementation for heap iteration. > > The bug was reported by SAP on Windows, but it is not Windows specific. > The bug is due to heap iteration uses parallel version of string dedup's > oops_do() implementation, which can interfere concurrent string dedup > cleaning task. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237963 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html > > Test: > ?? gc/shenandoah/jvmti/TestHeapDump.java test with > -XX:+UseStringDeduplication (fastdebug and release) on x86_64 Linux. > > Thanks, > > -Zhengyu From zgu at redhat.com Tue Jan 28 19:12:04 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 28 Jan 2020 14:12:04 -0500 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() In-Reply-To: <2de8acb4-bb48-ae91-47c5-b5a3adfa301c@redhat.com> References: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> <2de8acb4-bb48-ae91-47c5-b5a3adfa301c@redhat.com> Message-ID: <32f8b945-658d-f7d4-eff1-834fb2e9f6e9@redhat.com> Correction: Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.01/index.html Thanks, -Zhengyu On 1/28/20 2:03 PM, Zhengyu Gu wrote: > Sorry, the early fix is incorrect. > > Concurrent string dedup cleaning task may change string dedup > table/queue structures, that makes it unsafe for heap iteration to walk > them concurently. > > Instead, heap iteration should use concurrent version, to block out each > other. > > Updated webrev: > http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html > > Also changed bug synopsis to: > > Shenandoah: Heap iteration should use concurrent version of string dedup > roots > > Test: > ??? gc/shenandoah/jvmti/TestHeapDump.java test with > ?? -XX:+UseStringDeduplication (fastdebug and release) on x86_64 Linux. > ?? in loop with 10 iterations. > > Thanks, > > -Zhengyu > > On 1/28/20 10:34 AM, Zhengyu Gu wrote: >> Please review this patch that uses single-threaded string dedup's >> oops_do() implementation for heap iteration. >> >> The bug was reported by SAP on Windows, but it is not Windows >> specific. The bug is due to heap iteration uses parallel version of >> string dedup's oops_do() implementation, which can interfere >> concurrent string dedup cleaning task. >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237963 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.00/index.html >> >> Test: >> ??? gc/shenandoah/jvmti/TestHeapDump.java test with >> -XX:+UseStringDeduplication (fastdebug and release) on x86_64 Linux. >> >> Thanks, >> >> -Zhengyu From shade at redhat.com Tue Jan 28 19:15:29 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 28 Jan 2020 20:15:29 +0100 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() In-Reply-To: <32f8b945-658d-f7d4-eff1-834fb2e9f6e9@redhat.com> References: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> <2de8acb4-bb48-ae91-47c5-b5a3adfa301c@redhat.com> <32f8b945-658d-f7d4-eff1-834fb2e9f6e9@redhat.com> Message-ID: <37e1ebb9-456e-357e-dca5-44d4f7875710@redhat.com> On 1/28/20 8:12 PM, Zhengyu Gu wrote: > http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.01/index.html Looks good. This still passes tests, right? -- Thanks, -Aleksey From zgu at redhat.com Tue Jan 28 19:16:47 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 28 Jan 2020 14:16:47 -0500 Subject: [15] RFR 8237963: Shenandoah: Heap iteration should use single-threaded string dedup oops_do_slow() In-Reply-To: <37e1ebb9-456e-357e-dca5-44d4f7875710@redhat.com> References: <9cef2092-fc85-07c5-3764-a1103c11ce43@redhat.com> <2de8acb4-bb48-ae91-47c5-b5a3adfa301c@redhat.com> <32f8b945-658d-f7d4-eff1-834fb2e9f6e9@redhat.com> <37e1ebb9-456e-357e-dca5-44d4f7875710@redhat.com> Message-ID: <18717403-3080-5a4b-9a4a-6d3b3ec686f8@redhat.com> Thanks, Aleksey. On 1/28/20 2:15 PM, Aleksey Shipilev wrote: > On 1/28/20 8:12 PM, Zhengyu Gu wrote: >> http://cr.openjdk.java.net/~zgu/JDK-8237963/webrev.01/index.html > Looks good. > > This still passes tests, right? Of course. -Zhengyu > From manc at google.com Tue Jan 28 19:18:36 2020 From: manc at google.com (Man Cao) Date: Tue, 28 Jan 2020 11:18:36 -0800 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: References: Message-ID: Thanks, Thomas. I'll take a look briefly. Agreed that the test should be able to run first, before pushing this change. -Man From sangheon.kim at oracle.com Wed Jan 29 00:40:32 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 28 Jan 2020 16:40:32 -0800 Subject: RFR: 8233822: VM_G1CollectForAllocation should always check for upgrade to full In-Reply-To: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> References: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> Message-ID: <25625fd5-7ced-ed64-aea8-78f3367d1f02@oracle.com> Hi Kim, On 1/21/20 12:31 AM, Kim Barrett wrote: > Please review this G1 change to always check whether a full collection > should be performed after a non-full collection pause, e.g. the > collection needs to be "upgraded" to a full collection. There are > various conditions which can lead to needing to do that, and as the CR > suggests, we need to be consistent about checking for and performing > such an upgrade. > > This is accomplished by moving most of do_collection_pause_at_safepoint > into a helper function and changing that existing function to call the > helper, then check for and, if needed, perform a needed upgrade to a > full collection. Callers of that function are updated to remove > explict conditional upgrading, where present. This also addresses the > surprisingly placed call in a G1-specific block of code in gc/shared > (see also JDK-8237567). > > CR: > https://bugs.openjdk.java.net/browse/JDK-8233822 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8233822/open.00/ Looks good. Not related to your patch (so can ignore) but as you are changing VM_G1CollectForAllocation class, comments for relation at g1VMOperations.hpp seems out-dated. :) Thanks, Sangheon > > Testing: > mach5 tier1-5 > Locally (linux-x64) ran modified InfiniteList.java test (allocate > small rather than arrays) and verified some upgrades occurred as > expected. > > From kim.barrett at oracle.com Wed Jan 29 01:07:26 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 28 Jan 2020 20:07:26 -0500 Subject: RFR: 8233822: VM_G1CollectForAllocation should always check for upgrade to full In-Reply-To: <25625fd5-7ced-ed64-aea8-78f3367d1f02@oracle.com> References: <5389B188-BA91-412F-A12E-0DB5A96FF0A9@oracle.com> <25625fd5-7ced-ed64-aea8-78f3367d1f02@oracle.com> Message-ID: <966AA66F-3726-4088-8EE2-AD758B5EBCAA@oracle.com> > On Jan 28, 2020, at 7:40 PM, sangheon.kim at oracle.com wrote: > On 1/21/20 12:31 AM, Kim Barrett wrote: >> [?] >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8233822 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8233822/open.00/ > Looks good. Thanks. > Not related to your patch (so can ignore) but as you are changing VM_G1CollectForAllocation class, > comments for relation at g1VMOperations.hpp seems out-dated. :) I assume you are referring to the comment near the top of the file summarizing some parts of the class hierarchy? There's more wrong than right in that comment! It wrongly suggests VM_G1Concurrent and VM_G1CollectForAllocation both (directly) derive from VM_GC_Operation, and it's missing VM_G1TryInitiateConcMark. I started to correct the comment, but decided that beyond the first line summary it doesn't provide any information that isn't obvious from the code, and is also messy, so plan to just delete all but that first line. From thomas.schatzl at oracle.com Wed Jan 29 13:04:00 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 29 Jan 2020 14:04:00 +0100 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator Message-ID: Hi all, can I have reviews for this change that removes the ParallelTaskTerminator code? In JDK12 we introduced another implementation, the OWSTTaskTerminator, that seems to work just fine as default for more than a year now, so I think it is time to remove the old implementation. @Shenandoah-Team: I left the ShenandoahTaskTerminator wrapper, but removing the TaskTerminator wrapper for the other GCs. I can remove that too, but did not know if you wanted to keep the name. In a follow-up change I would like to change the name of the OWSTTaskTerminator to just TaskTerminator (also renaming the files, but keeping it separate to taskqueue.?pp). Tell me if there are any concerns. CR: https://bugs.openjdk.java.net/browse/JDK-8215297 Webrev: http://cr.openjdk.java.net/~tschatzl/8215297/webrev/ Testing: hs-tier1-5, builds with Shenandoah Thanks, Thomas From zgu at redhat.com Wed Jan 29 13:45:36 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 29 Jan 2020 08:45:36 -0500 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator In-Reply-To: References: Message-ID: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> Hi Thomas, Shared changes look good to me. I filed follow-up CR(JDK-8238162) to remove ShenandoahTaskTerminator wrapper. Thanks, -Zhengyu On 1/29/20 8:04 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this change that removes the > ParallelTaskTerminator code? In JDK12 we introduced another > implementation, the OWSTTaskTerminator, that seems to work just fine as > default for more than a year now, so I think it is time to remove the > old implementation. > > @Shenandoah-Team: I left the ShenandoahTaskTerminator wrapper, but > removing the TaskTerminator wrapper for the other GCs. I can remove that > too, but did not know if you wanted to keep the name. > > In a follow-up change I would like to change the name of the > OWSTTaskTerminator to just TaskTerminator (also renaming the files, but > keeping it separate to taskqueue.?pp). Tell me if there are any concerns. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8215297 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8215297/webrev/ > Testing: > hs-tier1-5, builds with Shenandoah > > Thanks, > ? Thomas > From thomas.schatzl at oracle.com Thu Jan 30 10:45:04 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 11:45:04 +0100 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator In-Reply-To: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> References: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> Message-ID: <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> Hi Zhengyu, On 29.01.20 14:45, Zhengyu Gu wrote: > Hi Thomas, > > Shared changes look good to me. thanks for your review; I touched up the files a little in http://cr.openjdk.java.net/~tschatzl/8215297/webrev.0_to_1 (diff) http://cr.openjdk.java.net/~tschatzl/8215297/webrev.1 (full) slightly after some feedback and finding some minor (pre-existing) issues. > > I filed follow-up CR(JDK-8238162) to remove ShenandoahTaskTerminator > wrapper. Okay. Thanks, Thomas From thomas.schatzl at oracle.com Thu Jan 30 11:08:35 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 12:08:35 +0100 Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names Message-ID: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> Hi all, can I have reviews for this small change that moves some global typedefs used only by Parallel GC from taskqueue.hpp to parallel gc files, and further makes naming of instances of these more uniform? CR: https://bugs.openjdk.java.net/browse/JDK-8238160 Webrev: http://cr.openjdk.java.net/~tschatzl/8238160/webrev/ Testing: local compilation Thanks, Thomas From thomas.schatzl at oracle.com Thu Jan 30 11:34:34 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 12:34:34 +0100 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator Message-ID: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> Hi all, can I have reviews for this renaming change of OWSTTaskTerminator to TaskTerminator now that there is only one task termination protocol implementation? I believe that the OWST prefix only makes the code harder to read without conveying interesting information at the uses. Based on JDK-8215297. CR: https://bugs.openjdk.java.net/browse/JDK-8238220 Webrev: http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ Testing: local compilation Thanks, Thomas From thomas.schatzl at oracle.com Thu Jan 30 11:56:53 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 12:56:53 +0100 Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code Message-ID: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> Hi all, can I have reviews for this removal of some debug code in the TaskTerminator class? The code counts the total number of yields/peeks/spins during task termination. Since it is guarded by a define, the code is not included in any regular build, so there is potential for bit-rotting. Since the code is not very complicated (and I believe it is too simple for real measurements), and needs rebuild anyway for use I propose to just remove it instead of trying to improve it for unknown requirements. Based on JDK-8238220. CR: https://bugs.openjdk.java.net/browse/JDK-8238229 Webrev: http://cr.openjdk.java.net/~tschatzl/8238229/webrev/ Testing: local compilation Thanks, Thomas From zgu at redhat.com Thu Jan 30 13:23:28 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 30 Jan 2020 08:23:28 -0500 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator In-Reply-To: <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> References: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> Message-ID: <93ce4809-34d9-a2f7-b38d-565fea5b6d81@redhat.com> Still good. Thanks, -Zhengyu On 1/30/20 5:45 AM, Thomas Schatzl wrote: > Hi Zhengyu, > > On 29.01.20 14:45, Zhengyu Gu wrote: >> Hi Thomas, >> >> Shared changes look good to me. > > ? thanks for your review; I touched up the files a little in > > http://cr.openjdk.java.net/~tschatzl/8215297/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8215297/webrev.1 (full) > > slightly after some feedback and finding some minor (pre-existing) issues. > >> >> I filed follow-up CR(JDK-8238162) to remove ShenandoahTaskTerminator >> wrapper. > > Okay. > > Thanks, > ? Thomas > From stefan.johansson at oracle.com Thu Jan 30 15:24:09 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 30 Jan 2020 16:24:09 +0100 Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code In-Reply-To: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> References: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> Message-ID: <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com> Looks good, StefanJ On 2020-01-30 12:56, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this removal of some debug code in the > TaskTerminator class? > > The code counts the total number of yields/peeks/spins during task > termination. Since it is guarded by a define, the code is not included > in any regular build, so there is potential for bit-rotting. > > Since the code is not very complicated (and I believe it is too simple > for real measurements), and needs rebuild anyway for use I propose to > just remove it instead of trying to improve it for unknown requirements. > > Based on JDK-8238220. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238229 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238229/webrev/ I agree that this can be removed, and there is even more code that should go. The call from each collected heap: src/hotspot/share/gc/shared/genCollectedHeap.cpp:680:#ifdef TRACESPINNING src/hotspot/share/gc/parallel/psParallelCompact.cpp:1972:#ifdef TRACESPINNING src/hotspot/share/gc/parallel/psScavenge.cpp:734:#ifdef TRACESPINNING src/hotspot/share/gc/g1/g1CollectedHeap.cpp:1135:#ifdef TRACESPINNING src/hotspot/share/gc/g1/g1CollectedHeap.cpp:3143:#ifdef TRACESPINNING Otherwise, looks good, StefanJ > Testing: > local compilation > > Thanks, > ? Thomas From stefan.johansson at oracle.com Thu Jan 30 15:30:25 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 30 Jan 2020 16:30:25 +0100 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator In-Reply-To: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> Message-ID: Hi Thomas, On 2020-01-30 12:34, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this renaming change of OWSTTaskTerminator to > TaskTerminator now that there is only one task termination protocol > implementation? > > I believe that the OWST prefix only makes the code harder to read > without conveying interesting information at the uses. > > Based on JDK-8215297. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238220 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ Looks good, StefanJ > Testing: > local compilation > > Thanks, > ? Thomas From stefan.johansson at oracle.com Thu Jan 30 15:47:36 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 30 Jan 2020 16:47:36 +0100 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator In-Reply-To: <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> References: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> Message-ID: <5f9aed62-f413-2808-c410-e5ef634aba88@oracle.com> Hi Thomas, On 2020-01-30 11:45, Thomas Schatzl wrote: > Hi Zhengyu, > > On 29.01.20 14:45, Zhengyu Gu wrote: >> Hi Thomas, >> >> Shared changes look good to me. > > ? thanks for your review; I touched up the files a little in > > http://cr.openjdk.java.net/~tschatzl/8215297/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8215297/webrev.1 (full) Seems to be some webrev-hickup with webrev.1, but I guess nothing changed in g1ConcurrentMark.cpp since first webrev so looked there instead. Looks good, especially after the follow-up patches. Thanks, StefanJ > > slightly after some feedback and finding some minor (pre-existing) issues. > >> >> I filed follow-up CR(JDK-8238162) to remove ShenandoahTaskTerminator >> wrapper. > > Okay. > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Thu Jan 30 16:40:01 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 17:40:01 +0100 Subject: RFR (M): 8215297: Remove ParallelTaskTerminator In-Reply-To: <5f9aed62-f413-2808-c410-e5ef634aba88@oracle.com> References: <46581590-d830-3096-d0ac-0dab873fb862@redhat.com> <1a5fa317-cd95-46f3-e82b-e7b8d24c9b0a@oracle.com> <5f9aed62-f413-2808-c410-e5ef634aba88@oracle.com> Message-ID: <16476d4a-26f7-bb6c-4ffd-68d071ca38ac@oracle.com> Hi, On 30.01.20 16:47, Stefan Johansson wrote: > Hi Thomas, > > On 2020-01-30 11:45, Thomas Schatzl wrote: >> Hi Zhengyu, >> >> On 29.01.20 14:45, Zhengyu Gu wrote: >>> Hi Thomas, >>> >>> Shared changes look good to me. >> >> ?? thanks for your review; I touched up the files a little in >> >> http://cr.openjdk.java.net/~tschatzl/8215297/webrev.0_to_1 (diff) >> http://cr.openjdk.java.net/~tschatzl/8215297/webrev.1 (full) > Seems to be some webrev-hickup with webrev.1, but I guess nothing > changed in g1ConcurrentMark.cpp since first webrev so looked there instead. > > Looks good, especially after the follow-up patches. Yes, I'm having issues with webrev lately. I managed to fix the webrev. Thanks for your review, Thomas From thomas.schatzl at oracle.com Thu Jan 30 16:43:16 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 30 Jan 2020 17:43:16 +0100 Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code In-Reply-To: <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com> References: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com> Message-ID: <00eec1c7-d524-44c1-a331-95088bb74f3c@oracle.com> Hi, On 30.01.20 16:24, Stefan Johansson wrote: > Looks good, > StefanJ all fixed. Idk why these were missing in that webrev, I regenerated it. Thanks, Thomas > > On 2020-01-30 12:56, Thomas Schatzl wrote: >> Hi all, >> >> ?? can I have reviews for this removal of some debug code in the >> TaskTerminator class? >> >> The code counts the total number of yields/peeks/spins during task >> termination. Since it is guarded by a define, the code is not included >> in any regular build, so there is potential for bit-rotting. >> >> Since the code is not very complicated (and I believe it is too simple >> for real measurements), and needs rebuild anyway for use I propose to >> just remove it instead of trying to improve it for unknown requirements. >> >> Based on JDK-8238220. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238229 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238229/webrev/ > > I agree that this can be removed, and there is even more code that > should go. The call from each collected heap: > src/hotspot/share/gc/shared/genCollectedHeap.cpp:680:#ifdef TRACESPINNING > src/hotspot/share/gc/parallel/psParallelCompact.cpp:1972:#ifdef > TRACESPINNING > src/hotspot/share/gc/parallel/psScavenge.cpp:734:#ifdef TRACESPINNING > src/hotspot/share/gc/g1/g1CollectedHeap.cpp:1135:#ifdef TRACESPINNING > src/hotspot/share/gc/g1/g1CollectedHeap.cpp:3143:#ifdef TRACESPINNING > > Otherwise, looks good, > StefanJ > >> Testing: >> local compilation >> >> Thanks, >> ?? Thomas From sangheon.kim at oracle.com Thu Jan 30 18:08:48 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 30 Jan 2020 10:08:48 -0800 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator In-Reply-To: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> Message-ID: Hi Thomas, On 1/30/20 3:34 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this renaming change of OWSTTaskTerminator to > TaskTerminator now that there is only one task termination protocol > implementation? > > I believe that the OWST prefix only makes the code harder to read > without conveying interesting information at the uses. > > Based on JDK-8215297. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238220 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ Looks good as is. One thing to note is the order of renamed header file. It looks like you are treating uppercase first? :) e.g. at g1CollectedHeap.cpp +#include "gc/shared/taskTerminator.hpp" #include "gc/shared/taskqueue.inline.hpp" I expect alphabet order first and then upper-lowercase. :) Thanks, Sangheon > Testing: > local compilation > > Thanks, > ? Thomas From kim.barrett at oracle.com Thu Jan 30 23:14:23 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 30 Jan 2020 18:14:23 -0500 Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code In-Reply-To: <00eec1c7-d524-44c1-a331-95088bb74f3c@oracle.com> References: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com> <00eec1c7-d524-44c1-a331-95088bb74f3c@oracle.com> Message-ID: > On Jan 30, 2020, at 11:43 AM, Thomas Schatzl wrote: > > Hi, > > On 30.01.20 16:24, Stefan Johansson wrote: >> Looks good, >> StefanJ > > all fixed. Idk why these were missing in that webrev, I regenerated it. > > Thanks, > Thomas > >> On 2020-01-30 12:56, Thomas Schatzl wrote: >>> Hi all, >>> >>> can I have reviews for this removal of some debug code in the TaskTerminator class? >>> >>> The code counts the total number of yields/peeks/spins during task termination. Since it is guarded by a define, the code is not included in any regular build, so there is potential for bit-rotting. >>> >>> Since the code is not very complicated (and I believe it is too simple for real measurements), and needs rebuild anyway for use I propose to just remove it instead of trying to improve it for unknown requirements. >>> >>> Based on JDK-8238220. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8238229 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8238229/webrev/ >> I agree that this can be removed, and there is even more code that should go. The call from each collected heap: Looks good. From manc at google.com Fri Jan 31 03:27:55 2020 From: manc at google.com (Man Cao) Date: Thu, 30 Jan 2020 19:27:55 -0800 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: References: Message-ID: Hi, I have incorporated Thomas's changes, and fixed the tests and updated the CR. New webrev: https://cr.openjdk.java.net/~manc/8234608/webrev.01/ The issue is that the signature of makeRedefinition0() in libdefine.cpp was wrong. It missed the "jclass clazz" parameter. I have tested using 'make test TEST="test/hotspot/jtreg/vmTestbase/gc/g1/unloading/tests/unloading_redefinition_*" ', for both fastdebug and product builds. I suppose Submit repo would not run these tests, because it only runs tier1. Am I correct? -Man On Tue, Jan 28, 2020 at 11:18 AM Man Cao wrote: > Thanks, Thomas. I'll take a look briefly. > Agreed that the test should be able to run first, before pushing this > change. > > -Man > From rwestrel at redhat.com Fri Jan 31 08:47:12 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 31 Jan 2020 09:47:12 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test Message-ID: <87wo98t3lb.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8237776/webrev.00/ xmm0 (an argument to a call) gets corrupted in the c2i adapter (when going from c1 code to the interpreter) at the ShenandoahRuntime::write_ref_field_pre_entry() runtime call. That call is in the c2i because of c2i_entry_barrier() and resolve_weak_handle(). The proposed fix saves all floating point argument registers. Roland. From rkennke at redhat.com Fri Jan 31 09:51:26 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 31 Jan 2020 10:51:26 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87wo98t3lb.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> Message-ID: Hey Roland, the patch looks good, but it lacks the x86_32 counterpart. Or would you rather handle that separately? Thanks, Roman Roland Westrelin schrieb am Fr., 31. Jan. 2020, 09:47: > > http://cr.openjdk.java.net/~roland/8237776/webrev.00/ > > xmm0 (an argument to a call) gets corrupted in the c2i adapter (when > going from c1 code to the interpreter) at the > ShenandoahRuntime::write_ref_field_pre_entry() runtime call. That call > is in the c2i because of c2i_entry_barrier() and > resolve_weak_handle(). The proposed fix saves all floating point > argument registers. > > Roland. > > From rwestrel at redhat.com Fri Jan 31 09:55:13 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 31 Jan 2020 10:55:13 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: References: <87wo98t3lb.fsf@redhat.com> Message-ID: <87tv4ct0fy.fsf@redhat.com> > the patch looks good, but it lacks the x86_32 counterpart. Or would you > rather handle that separately? Actually, AFAIU, the 64 bits fix covers 32 bits too. 32bits needs xmm0 and xmm1 saved. So we're saving too many registers on 32 bits but that seems pretty armless. Roland. From rkennke at redhat.com Fri Jan 31 09:58:39 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 31 Jan 2020 10:58:39 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87tv4ct0fy.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> <87tv4ct0fy.fsf@redhat.com> Message-ID: Roland Westrelin schrieb am Fr., 31. Jan. 2020, 10:55: > > > the patch looks good, but it lacks the x86_32 counterpart. Or would you > > rather handle that separately? > > Actually, AFAIU, the 64 bits fix covers 32 bits too. 32bits needs xmm0 > and xmm1 saved. So we're saving too many registers on 32 bits but that > seems pretty armless. > > Roland. > Ah OK. Good then! (armless... nice typo! :-D ) Thanks, Roman > From thomas.schatzl at oracle.com Fri Jan 31 10:41:13 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 31 Jan 2020 11:41:13 +0100 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator In-Reply-To: References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> Message-ID: <10c01fdb-d6e3-01a3-6cee-a8f467fac372@oracle.com> Hi Sangheon, On 30.01.20 19:08, sangheon.kim at oracle.com wrote: > Hi Thomas, > > On 1/30/20 3:34 AM, Thomas Schatzl wrote: >> Hi all, >> >> ? can I have reviews for this renaming change of OWSTTaskTerminator to >> TaskTerminator now that there is only one task termination protocol >> implementation? >> >> I believe that the OWST prefix only makes the code harder to read >> without conveying interesting information at the uses. >> >> Based on JDK-8215297. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238220 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ > Looks good as is. > > One thing to note is the order of renamed header file. > It looks like you are treating uppercase first? :) > > e.g. at g1CollectedHeap.cpp > > +#include "gc/shared/taskTerminator.hpp" > #include "gc/shared/taskqueue.inline.hpp" > > > I expect alphabet order first and then upper-lowercase. :) > by default, upper case sorts before lower case in many if not all situations on computers since typically all upper case letters are "before" lower case letters in character sets. I would like to keep it as is unless you or somebody else really objects - there does not seem to be a precedence in hotspot files. Thanks, Thomas From ivan.walulya at oracle.com Fri Jan 31 12:22:35 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 31 Jan 2020 13:22:35 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS Message-ID: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> Hi all, Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ Testing: Tier 1 - Tier 3 //Ivan From per.liden at oracle.com Fri Jan 31 12:32:11 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 31 Jan 2020 13:32:11 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> Message-ID: <91bccd09-a5c7-faea-204e-cf7c9a0dc561@oracle.com> Looks good. /Per On 1/31/20 1:22 PM, Ivan Walulya wrote: > Hi all, > > Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ > > > Testing: Tier 1 - Tier 3 > > //Ivan > From thomas.schatzl at oracle.com Fri Jan 31 12:34:04 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 31 Jan 2020 13:34:04 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> Message-ID: Hi, On 31.01.20 13:22, Ivan Walulya wrote: > Hi all, > > Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ > > > Testing: Tier 1 - Tier 3 looks good. Thomas From ivan.walulya at oracle.com Fri Jan 31 12:35:57 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 31 Jan 2020 13:35:57 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: <91bccd09-a5c7-faea-204e-cf7c9a0dc561@oracle.com> References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> <91bccd09-a5c7-faea-204e-cf7c9a0dc561@oracle.com> Message-ID: Thanks Per! //Ivan > On 31 Jan 2020, at 13:32, Per Liden wrote: > > Looks good. > > /Per > > On 1/31/20 1:22 PM, Ivan Walulya wrote: >> Hi all, >> Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ >> Testing: Tier 1 - Tier 3 >> //Ivan From ivan.walulya at oracle.com Fri Jan 31 12:36:12 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 31 Jan 2020 13:36:12 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> Message-ID: <58A3B3BE-BD5C-433D-81C9-6859FB973E25@oracle.com> Thanks Thomas! //Ivan > On 31 Jan 2020, at 13:34, Thomas Schatzl wrote: > > Hi, > > On 31.01.20 13:22, Ivan Walulya wrote: >> Hi all, >> Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ >> Testing: Tier 1 - Tier 3 > > looks good. > > Thomas From leo.korinth at oracle.com Fri Jan 31 12:41:51 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Fri, 31 Jan 2020 13:41:51 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> Message-ID: <42a03b0e-e48b-28c5-b16b-dd5f04d10f3e@oracle.com> On 31/01/2020 13:22, Ivan Walulya wrote: > Hi all, > > Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 > Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ Looks good, I will sponsor it for you. Thanks, Leo > > > Testing: Tier 1 - Tier 3 > > //Ivan > From rkennke at redhat.com Fri Jan 31 12:49:48 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 31 Jan 2020 13:49:48 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: References: <87wo98t3lb.fsf@redhat.com> <87tv4ct0fy.fsf@redhat.com> Message-ID: Thinking more about it: we should probably change the synopsis for the bug and RFR: - Fix is not in Shenandoah code, but shared. Even though only Shenandoah currently seems to fall over it. Also, should probably draw attention of non-Shenandoah reviewers... - Maybe reflect what the problem is and/or what the fix is? Thanks, Roman Roman Kennke schrieb am Fr., 31. Jan. 2020, 10:58: > > > Roland Westrelin schrieb am Fr., 31. Jan. 2020, > 10:55: > >> >> > the patch looks good, but it lacks the x86_32 counterpart. Or would you >> > rather handle that separately? >> >> Actually, AFAIU, the 64 bits fix covers 32 bits too. 32bits needs xmm0 >> and xmm1 saved. So we're saving too many registers on 32 bits but that >> seems pretty armless. >> >> Roland. >> > > Ah OK. Good then! > > (armless... nice typo! :-D ) > > Thanks, > Roman > > >> From rwestrel at redhat.com Fri Jan 31 12:54:21 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 31 Jan 2020 13:54:21 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: References: <87wo98t3lb.fsf@redhat.com> <87tv4ct0fy.fsf@redhat.com> Message-ID: <87lfpnu6pu.fsf@redhat.com> > - Fix is not in Shenandoah code, but shared. Even though only Shenandoah > currently seems to fall over it. Also, should probably draw attention of > non-Shenandoah reviewers... Fix IS in shenandoah code. Roland. From rkennke at redhat.com Fri Jan 31 12:58:48 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 31 Jan 2020 13:58:48 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87lfpnu6pu.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> <87tv4ct0fy.fsf@redhat.com> <87lfpnu6pu.fsf@redhat.com> Message-ID: Roland Westrelin schrieb am Fr., 31. Jan. 2020, 13:54: > > > - Fix is not in Shenandoah code, but shared. Even though only Shenandoah > > currently seems to fall over it. Also, should probably draw attention of > > non-Shenandoah reviewers... > > Fix IS in shenandoah code. > /me looks again. Duh. I should not review stuff on the phone, in the train. Nevermind then.. Thanks, Roman > > > Roland. > > From ivan.walulya at oracle.com Fri Jan 31 13:32:17 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 31 Jan 2020 14:32:17 +0100 Subject: JDK-8233220: Remove Space::_par_seq_tasks member as it was only used by CMS In-Reply-To: <42a03b0e-e48b-28c5-b16b-dd5f04d10f3e@oracle.com> References: <32A6D3AA-F9CD-42C4-B810-2CA05AEFBFC0@oracle.com> <42a03b0e-e48b-28c5-b16b-dd5f04d10f3e@oracle.com> Message-ID: Thanks Leo! //Ivan > On 31 Jan 2020, at 13:41, Leo Korinth wrote: > > On 31/01/2020 13:22, Ivan Walulya wrote: >> Hi all, >> Please review a minor enhancement to remove Space::_par_seq_tasks member which was only used by CMS. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233220 >> Webrev: http://cr.openjdk.java.net/~lkorinth/ivan/8233220/ > > Looks good, I will sponsor it for you. > > Thanks, > Leo > >> Testing: Tier 1 - Tier 3 >> //Ivan From zgu at redhat.com Fri Jan 31 13:55:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 31 Jan 2020 08:55:52 -0500 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87wo98t3lb.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> Message-ID: <316762d7-b1b0-97c4-6cdb-ac31bd2d7e6b@redhat.com> Looks good to me. -Zhengyu On 1/31/20 3:47 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8237776/webrev.00/ > > xmm0 (an argument to a call) gets corrupted in the c2i adapter (when > going from c1 code to the interpreter) at the > ShenandoahRuntime::write_ref_field_pre_entry() runtime call. That call > is in the c2i because of c2i_entry_barrier() and > resolve_weak_handle(). The proposed fix saves all floating point > argument registers. > > Roland. > From sangheon.kim at oracle.com Fri Jan 31 17:54:02 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 31 Jan 2020 09:54:02 -0800 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator In-Reply-To: <10c01fdb-d6e3-01a3-6cee-a8f467fac372@oracle.com> References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> <10c01fdb-d6e3-01a3-6cee-a8f467fac372@oracle.com> Message-ID: <65ce518b-56da-92a8-010a-e58c5c015a7e@oracle.com> Hi Thomas, On 1/31/20 2:41 AM, Thomas Schatzl wrote: > Hi Sangheon, > > On 30.01.20 19:08, sangheon.kim at oracle.com wrote: >> Hi Thomas, >> >> On 1/30/20 3:34 AM, Thomas Schatzl wrote: >>> Hi all, >>> >>> ? can I have reviews for this renaming change of OWSTTaskTerminator >>> to TaskTerminator now that there is only one task termination >>> protocol implementation? >>> >>> I believe that the OWST prefix only makes the code harder to read >>> without conveying interesting information at the uses. >>> >>> Based on JDK-8215297. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8238220 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ >> Looks good as is. >> >> One thing to note is the order of renamed header file. >> It looks like you are treating uppercase first? :) >> >> e.g. at g1CollectedHeap.cpp >> >> +#include "gc/shared/taskTerminator.hpp" >> ? #include "gc/shared/taskqueue.inline.hpp" >> >> >> I expect alphabet order first and then upper-lowercase. :) >> > > ? by default, upper case sorts before lower case in many if not all > situations on computers since typically all upper case letters are > "before" lower case letters in character sets. > > I would like to keep it as is unless you or somebody else really > objects - there does not seem to be a precedence in hotspot files. I'm fine with current order. As you said personally, hotspot style just says "Keep the include lines sorted". https://wiki.openjdk.java.net/display/HotSpot/StyleGuide Thanks, Sangheon > > Thanks, > ? Thomas From kim.barrett at oracle.com Fri Jan 31 22:25:34 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 31 Jan 2020 17:25:34 -0500 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> Message-ID: <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> > On Jan 23, 2020, at 3:10 PM, Kim Barrett wrote: > >> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >> On 16.01.20 09:51, Kim Barrett wrote: >>> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >>> is one of the two remaining super-special "access" ranked mutexes. >>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >>> by JDK-8221360.) >>> There are three main parts to this change. >>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >>> lock-free FIFO queue. >>> (2) Replace the use of a HotSpot monitor for signaling activation of >>> concurrent refinement threads with a semaphore-based solution. >>> (3) Handle pausing of buffer refinement in the middle of a buffer in >>> order to handle a pending safepoint request. This can no longer just >>> push the partially processed buffer back onto the queue, due to ABA >>> problems now that the buffer is lock-free. >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8237143 >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >>> Testing: >>> mach5 tier1-5 >>> Normal performance testing showed no significant change. >>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >>> improvement, though not statistically significant; removing contention >>> for that lock by many hardware threads may be a little bit noticeable. >> >> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. > > After some offline discussion with Thomas, I?m doing some restructuring that > makes it probably not very efficient for anyone else to do a careful review of > the open.00 version. Here's a new webrev: https://cr.openjdk.java.net/~kbarrett/8237143/open.02/ Testing: mach5 tier1-5 Performance testing showed no significant change. I didn't bother providing an incremental webrev, because the changes to g1DirtyCardQueue.[ch]pp are pretty substantial. Those are the only files changed, except for the suggested move of the comment for G1ConcurrentRefineThread::maybe_deactivate and some related comment improvements nearby. Most of this round of changes are refactoring within G1DirtyCardQueueSet, mainly adding internal helper classes for the FIFO queue and for the paused buffers, each with their own (commented) APIs. I think that has addressed a lot of Thomas's comments about the comments, and I hope has made the code easier to understand. I've also improved the mechanism for handling "paused" buffers, simplifying it by making better use of some invariants. > On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: > // The key idea to make this work is that pop (get_completed_buffer) > // never returns an element of the queue if it is the only accessible > // element, > If I understand this correctly, maybe "if there is only one buffer in the FIFO" is easier to understand than "only accessible element". (or define "accessible element?). I specifically don't want to say it that way because we could have a situation like (1) Start with a queue having exactly one element. (2) Thread1 starts a push by updating tail, but has not yet linked the old tail to the new. (3) Thread2 performs a push. The buffer pushed by Thread2 is "in the queue" by some reasonable definition, so the queue contains two buffers. But that buffer is not yet accessible, because Thread1 hasn't completed its push. The alternative is to (in the description) somehow divorce a completed push from the notion of the number of buffers in the queue, which seems worse to me. I expanded the discussion a bit though, including what is meant by "accessible". > The code seems to unnecessarily use the NULL_buffer constant. Maybe use it here too. Overall I am not sure about the usefulness of using NULL_buffer in the code. The NULL value in Hotspot code is generally accepted as a special value, and the name "NULL_buffer" does not seem to add any information. The point of NULL_buffer was to avoid casts of NULL in Atomic operations, and I then used it consistently. But I've changed to using such casts, since it turned out there weren't that many and we can get rid of those uniformly here and elsewhere when we have C++11 nullptr and nullptr_t.