From rwestrel at redhat.com Mon Feb 3 08:00:44 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 03 Feb 2020 09:00:44 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87wo98t3lb.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> Message-ID: <87ftfst80j.fsf@redhat.com> After some offline discussion with Aleksey, here is an updated webrev: http://cr.openjdk.java.net/~roland/8237776/webrev.01/ Only difference is an assert that checks the number of fp arguments. Roland. From shade at redhat.com Mon Feb 3 08:19:59 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 3 Feb 2020 09:19:59 +0100 Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test In-Reply-To: <87ftfst80j.fsf@redhat.com> References: <87wo98t3lb.fsf@redhat.com> <87ftfst80j.fsf@redhat.com> Message-ID: <93eed5d6-bae4-a919-a8a6-aa31294b0825@redhat.com> On 2/3/20 9:00 AM, Roland Westrelin wrote: > > After some offline discussion with Aleksey, here is an updated webrev: > > http://cr.openjdk.java.net/~roland/8237776/webrev.01/ Looks good! -- Thanks, -Aleksey From thomas.schatzl at oracle.com Mon Feb 3 09:19:21 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 3 Feb 2020 10:19:21 +0100 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: References: Message-ID: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com> Hi, On 31.01.20 04:27, Man Cao wrote: > Hi, > > I have incorporated Thomas's changes, and fixed the tests and updated the > CR. > New webrev: https://cr.openjdk.java.net/~manc/8234608/webrev.01/ > > The issue is that the signature of makeRedefinition0() in libdefine.cpp was > wrong. > It missed the "jclass clazz" parameter. > > I have tested using 'make test > TEST="test/hotspot/jtreg/vmTestbase/gc/g1/unloading/tests/unloading_redefinition_*" > ', for both fastdebug and product builds. > > I suppose Submit repo would not run these tests, because it only runs > tier1. Am I correct? hs-tier1-5 passed. Looks good. Thomas From thomas.schatzl at oracle.com Mon Feb 3 09:53:41 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 3 Feb 2020 10:53:41 +0100 Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator In-Reply-To: <65ce518b-56da-92a8-010a-e58c5c015a7e@oracle.com> References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com> <10c01fdb-d6e3-01a3-6cee-a8f467fac372@oracle.com> <65ce518b-56da-92a8-010a-e58c5c015a7e@oracle.com> Message-ID: <69f9f6fe-8ec3-cf5f-2c0a-97bddee31624@oracle.com> Hi Sangheon, Stefan, On 31.01.20 18:54, sangheon.kim at oracle.com wrote: > Hi Thomas, > > On 1/31/20 2:41 AM, Thomas Schatzl wrote: >> Hi Sangheon, >> >> On 30.01.20 19:08, sangheon.kim at oracle.com wrote: >>> Hi Thomas, >>> >>> On 1/30/20 3:34 AM, Thomas Schatzl wrote: >>>> Hi all, >>>> >>>> ? can I have reviews for this renaming change of OWSTTaskTerminator >>>> to TaskTerminator now that there is only one task termination >>>> protocol implementation? >>>> >>>> I believe that the OWST prefix only makes the code harder to read >>>> without conveying interesting information at the uses. >>>> >>>> Based on JDK-8215297. >>>> >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8238220 >>>> Webrev: >>>> http://cr.openjdk.java.net/~tschatzl/8238220/webrev/ >>> Looks good as is. >>> >>> One thing to note is the order of renamed header file. >>> It looks like you are treating uppercase first? :) >>> >>> e.g. at g1CollectedHeap.cpp >>> >>> +#include "gc/shared/taskTerminator.hpp" >>> ? #include "gc/shared/taskqueue.inline.hpp" >>> >>> >>> I expect alphabet order first and then upper-lowercase. :) >>> >> >> ? by default, upper case sorts before lower case in many if not all >> situations on computers since typically all upper case letters are >> "before" lower case letters in character sets. >> >> I would like to keep it as is unless you or somebody else really >> objects - there does not seem to be a precedence in hotspot files. > I'm fine with current order. > As you said personally, hotspot style just says "Keep the include lines > sorted". > > https://wiki.openjdk.java.net/display/HotSpot/StyleGuide > > Thanks, > Sangheon > thanks for your reviews. Thomas From thomas.schatzl at oracle.com Mon Feb 3 09:55:36 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 3 Feb 2020 10:55:36 +0100 Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code In-Reply-To: References: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com> <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com> <00eec1c7-d524-44c1-a331-95088bb74f3c@oracle.com> Message-ID: Hi Kim, Stefan, On 31.01.20 00:14, Kim Barrett wrote: >> On Jan 30, 2020, at 11:43 AM, Thomas Schatzl wrote: >> >> Hi, >> >> On 30.01.20 16:24, Stefan Johansson wrote: >>> Looks good, >>> StefanJ >> >> all fixed. Idk why these were missing in that webrev, I regenerated it. >> >> Thanks, >> Thomas >> >>> On 2020-01-30 12:56, Thomas Schatzl wrote: >>>> Hi all, >>>> >>>> can I have reviews for this removal of some debug code in the TaskTerminator class? [...] >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8238229 >>>> Webrev: >>>> http://cr.openjdk.java.net/~tschatzl/8238229/webrev/ >>> I agree that this can be removed, and there is even more code that should go. The call from each collected heap: > > Looks good. > > thanks for your reviews. Thomas From zgu at redhat.com Mon Feb 3 20:59:28 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 3 Feb 2020 15:59:28 -0500 Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" Message-ID: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> Shenandoah uses oop mark word's "marked" pattern to indicate forwarding. Unfortunately, JVMTI heap walk (VM_HeapWalkOperation) also uses this pattern to indicate visited. The conflicts present serious problems during Shenandoah's concurrent evacuation and concurrent reference update phases, as it blindly treats "marked" pattern as "forwarding". There are invariants we can use to distinguish "forwarding" and "visited" pattern. 1. Marked pattern in collection set indicates forwarding 2. Marked pattern off collection set indicates visited by ObjectMarker (because oops seen by ObjectMarker were LRB'd) 3. No off collection set marked pattern at any shenandoah safepoint. In fact, no off collection set marked pattern at any safepoints except VM_HeapWalkOperation safepoints. This is an important invariant, since traversal degenerated GC drops collection set before entering degenerated GC cycle. Note: We only downgrade some debug assertions, but preserve full capacities of verifier, because verifier always runs at safepoints. Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ Test: hotspot_gc_shenandoah vmTestbase_nsk_jvmti vmTestbase_nsk_jdi Thanks, -Zhengyu From m.sundar85 at gmail.com Tue Feb 4 03:38:46 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Mon, 3 Feb 2020 22:38:46 -0500 Subject: Parallel GC Thread crash Message-ID: Hi, I am seeing following crashes frequently on our servers # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 # # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # https://github.com/AdoptOpenJDK/openjdk-build/issues # --------------- T H R E A D --------------- Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" [stack: 0x00007fca30277000,0x00007fca30377000] [id=108299] Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xb0 V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf707fd] Thread::call_run()+0x10d V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007fb85c004800 (nid = 111387) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 225122 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8] J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] J 225129 c2 webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac] J 131643 c2 webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] J 55114 c2 webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644] J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] J 16114% c2 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c] j com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] J 7560 c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 [0x00007fca15b23160+0x0000000000000df4] J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc [0x00007fca15b39a40+0x000000000000007c] J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] v ~StubRoutines::call_stub siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 Register to memory mapping: ... Can someone shed more info on when this can happen? I am seeing this on multiple servers with Java 13.0.1+9 on RHEL6 servers. There was another thread in hotspot runtime where David Holmes pointed this > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 > This seems it may be related to: > https://bugs.openjdk.java.net/browse/JDK-8004124 Just wondering if this is same or something to do with GC specific. TIA Sundar From stefan.karlsson at oracle.com Tue Feb 4 10:47:32 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 4 Feb 2020 11:47:32 +0100 Subject: Parallel GC Thread crash In-Reply-To: References: Message-ID: Hi Sundar, The GC crashes when it encounters something bad on the stack: > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, This is probably not a GC bug. It's more likely that this is caused by the JIT compiler. I see in your hotspot-runtime-dev thread, that you also get crashes in other compiler related areas. If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC, and see if this asserts before the GC has started running. StefanK On 2020-02-04 04:38, Sundara Mohan M wrote: > Hi, > I am seeing following crashes frequently on our servers > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > > --------------- T H R E A D --------------- > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" [stack: > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 225122 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8] > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > J 225129 c2 > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac] > J 131643 c2 > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] > J 55114 c2 > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644] > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > J 16114% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > J 7560 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > [0x00007fca15b23160+0x0000000000000df4] > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > [0x00007fca15b39a40+0x000000000000007c] > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > v ~StubRoutines::call_stub > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > > Register to memory mapping: > ... > > Can someone shed more info on when this can happen? I am seeing this on > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > There was another thread in hotspot runtime where David Holmes pointed this >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > >> This seems it may be related to: >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > Just wondering if this is same or something to do with GC specific. > > > > TIA > Sundar > From zgu at redhat.com Tue Feb 4 13:35:43 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 4 Feb 2020 08:35:43 -0500 Subject: [15] RFR 8238162: Shenandoah: Remove ShenandoahTaskTerminator wrapper Message-ID: I can not recall why we still have terminator wrapper, probably a leftover after we upstreamed OWST terminator. Let's remove it. Bug: https://bugs.openjdk.java.net/browse/JDK-8238162 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238162/webrev.00/index.html Test: hotspot_gc_shenandoah Thanks, -Zhengyu From thomas.schatzl at oracle.com Tue Feb 4 14:39:41 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 4 Feb 2020 15:39:41 +0100 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> Message-ID: Hi Kim, On 31.01.20 23:25, Kim Barrett wrote: >> On Jan 23, 2020, at 3:10 PM, Kim Barrett wrote: >> >>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >>> On 16.01.20 09:51, Kim Barrett wrote: >>>> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >>>> is one of the two remaining super-special "access" ranked mutexes. >>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >>>> by JDK-8221360.) >>>> There are three main parts to this change. >>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >>>> lock-free FIFO queue. >>>> (2) Replace the use of a HotSpot monitor for signaling activation of >>>> concurrent refinement threads with a semaphore-based solution. >>>> (3) Handle pausing of buffer refinement in the middle of a buffer in >>>> order to handle a pending safepoint request. This can no longer just >>>> push the partially processed buffer back onto the queue, due to ABA >>>> problems now that the buffer is lock-free. >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8237143 >>>> Webrev: >>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >>>> Testing: >>>> mach5 tier1-5 >>>> Normal performance testing showed no significant change. >>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >>>> improvement, though not statistically significant; removing contention >>>> for that lock by many hardware threads may be a little bit noticeable. >>> >>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. >> >> After some offline discussion with Thomas, I?m doing some restructuring that >> makes it probably not very efficient for anyone else to do a careful review of >> the open.00 version. > > Here's a new webrev: > > https://cr.openjdk.java.net/~kbarrett/8237143/open.02/ > I think this is good. Thanks for your extensive changes. Two minor issues. Do not need re-review: * s/unsufficient/insufficient in g1DirtyCardQueue.cpp * simple predicates returning bool tend to have an "is_" or "has_" prepended to it, i.e. s/PausedBuffers::empty()/...::is_empty()/ Thanks, Thomas From shade at redhat.com Tue Feb 4 19:15:16 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Feb 2020 20:15:16 +0100 Subject: [15] RFR 8238162: Shenandoah: Remove ShenandoahTaskTerminator wrapper In-Reply-To: References: Message-ID: On 2/4/20 2:35 PM, Zhengyu Gu wrote: > I can not recall why we still have terminator wrapper, probably a > leftover after we upstreamed OWST terminator. Let's remove it. I think we have upstreamed our version here? If so, please link it to 8238162: https://bugs.openjdk.java.net/browse/JDK-8204947 > Bug: https://bugs.openjdk.java.net/browse/JDK-8238162 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238162/webrev.00/index.html Looks good. -- Thanks, -Aleksey From shade at redhat.com Tue Feb 4 19:23:05 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Feb 2020 20:23:05 +0100 Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> Message-ID: On 2/3/20 9:59 PM, Zhengyu Gu wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ Uh. It seems to me the cure is worse than the disease: 1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough testing, and we are too deep in RDP2 for this; 2) It effectively disables asserts for anything not in collection set. Which means it disables most of asserts. The fact that Verifier still works is a small consolation. I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe. -- Thanks, -Aleksey From zgu at redhat.com Tue Feb 4 19:29:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 4 Feb 2020 14:29:52 -0500 Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> Message-ID: <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com> On 2/4/20 2:23 PM, Aleksey Shipilev wrote: > On 2/3/20 9:59 PM, Zhengyu Gu wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ > > Uh. It seems to me the cure is worse than the disease: > 1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough > testing, and we are too deep in RDP2 for this; > 2) It effectively disables asserts for anything not in collection set. Which means it disables > most of asserts. The fact that Verifier still works is a small consolation. > > I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with > mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe. I have yet to test 11-shenandoah. But performing JVMTI heap walk during evacuation phase, still sounds the alarm to me. -Zhengyu > From shade at redhat.com Tue Feb 4 19:33:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 4 Feb 2020 20:33:28 +0100 Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com> References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com> Message-ID: On 2/4/20 8:29 PM, Zhengyu Gu wrote: > On 2/4/20 2:23 PM, Aleksey Shipilev wrote: >> On 2/3/20 9:59 PM, Zhengyu Gu wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ >> >> Uh. It seems to me the cure is worse than the disease: >> 1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough >> testing, and we are too deep in RDP2 for this; >> 2) It effectively disables asserts for anything not in collection set. Which means it disables >> most of asserts. The fact that Verifier still works is a small consolation. >> >> I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with >> mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe. > > I have yet to test 11-shenandoah. But performing JVMTI heap walk during > evacuation phase, still sounds the alarm to me. Right. There is still plenty of time to fix 11. Let's not rush it in 14. -- Thanks, -Aleksey From m.sundar85 at gmail.com Tue Feb 4 20:21:26 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 4 Feb 2020 15:21:26 -0500 Subject: Parallel GC Thread crash In-Reply-To: References: Message-ID: Thanks for the tip! On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson wrote: > Hi Sundar, > > The GC crashes when it encounters something bad on the stack: > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > This is probably not a GC bug. It's more likely that this is caused by > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > also get crashes in other compiler related areas. > > If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > running. > > StefanK > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > Hi, > > I am seeing following crashes frequently on our servers > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > parallel > > gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" > [stack: > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 225122 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8] > > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > J 225129 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac] > > J 131643 c2 > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] > > J 55114 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644] > > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > J 16114% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > J 7560 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > [0x00007fca15b23160+0x0000000000000df4] > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > [0x00007fca15b39a40+0x000000000000007c] > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Register to memory mapping: > > ... > > > > Can someone shed more info on when this can happen? I am seeing this on > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > There was another thread in hotspot runtime where David Holmes pointed > this > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > >> This seems it may be related to: > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > TIA > > Sundar > > > From thomas.schatzl at oracle.com Wed Feb 5 08:13:57 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 5 Feb 2020 09:13:57 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> Message-ID: <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> Hi Liang, apologies for the late reply - I did look at the patch immediately after you posted it, but initial tests showed that it does not work as (I) expected. More about that below. So I went ahead and hack up something that comes closer to what I had in mind. Unfortunately other more urgent issues came up, which caused the delay on this work. Sorry. (And sorry for the long post). Not having any kind of workload to work with for testing the change I used some configuration of specjbb2015 with fixed ir [0] (taken from a colleague's unrelated recent internal test), simulating a constant load the user wants to control the heap usage of. In this situatio I want to apologize to use specjbb2015 for this public reply because it not openly available, but I only noticed when writing up this email. Finding a substitute and redoing measurements would probably take more time. I will start looking into this issue. Anyway, in my test scenario, after warmup, the user tries to first limit the heap to 2GB, and after a while to 3GB, and then back to 8GB. The resulting graph [1] shows heap metrics over time: blue ("soft") is the current SoftMaxHeapSize, pink ("committed") represents committed memory, yellow ("goal") shows G1's current heap size goal, turquoise ("free") the amount of free heap and purple ("used") the amount of used memory. Ignoring the drop from ~second 30-100 where I finally managed to set Min/MaxHeapFreeRatio ;) you can see that G1 kind of stabilizes at around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to 2GB. As you can see, G1 ignores the request. This corresponds to the code where apparently the heap is only reduced to SoftMaxHeapSize if there is enough free space to reduce to that value (I think). At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected drop in memory usage. However, since the change does not modify G1 goals it ultimately just ignores the SoftMaxHeapSize goal. It probably worked if there were no further application activity. I created a webrev of an alternative attempt that modifies G1's goal/target heap size in the adaptive IHOP mechanism so that G1 automatically starts marking so that a space reclamation phase starts before reaching softmaxheapsize. It basically changes the predictor's reserve according to current committed heap size not only based on G1ReservePercent, but also on the specified SoftMaxHeapSize. One complication in a generational setting is to adapt young gen (particularly survivor size) to that goal too, but I think the change does okay with that. However it is not finished yet, there is debugging code in it and one FIXME that is about shuffling around code properly. In the graph at [3] you can see the results, with same metrics shown. In this case G1 fairly well follows the soft goal. For the 2g softmaxheapsize goal it works perfectly in the example (*1), in the 3g softmaxheapsize change we get some initial short overshoot in committed memory. (*2/*3) There are however some problems/differences to your solution here which need to be discussed a bit more to see if it fits you and ultimately make it perform better: *0 this change uses existing sizing to uncommit memory, i.e. memory is not uncommitted immediately but part of regular operation. This means that the garbage collection cycle needs to advance. In case of specjbb with fixed IR this is no issue, but completely quiescent applications need other mechanisms like the "Promptly Return Unused Committed Memory (JEP 346) feature enabled. Some tuning is needed in that mechanism for almost-idle applications. *1 the problem with only setting SoftMaxHeapSize and relying on the regular uncommit mechanism is that due to other reasons, e.g. GCTimeRatio, G1 won't achieve this kind of compact heap. This is the reason why my setup includes the GCTimeRatio=4 on the command line - otherwise in neither case G1 would achieve the 2g goal (it would settle around 3g with my changes, didn't test the original changes; max heap usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify it during runtime (i.e. when you want to select a different throughput/latency tradeoff to achieve lower heap usage). *2 looking at the results more closely the (first) overshoot in the 3g soft max heap size goal, I think this is a remaining issue in the heap sizing policy in conjunction with soft max heap size, i.e. temporarily the target gctimeratio is set to 10% for various reasons. (in G1HeapSizingPolicy::expansion_amount()). In the log I have, the problem seems to be that we are re-setting the softmaxheapsize within the space reclamation phase (i.e. mixed gc) and G1 sizing policies got confused, i.e. it partially keeps on using the 2g goal for young gen sizing until the *2 problem expands it. That's a bug and needs to be fixed. So far previous text only looked at the best case where everything fits together; there are some other issues which will prevent you from achieving a tight heap in some cases that I noticed during my testing. Something to think about. *4 GCTimeRatio/heap expansion during young gc has different goals than the (un-)commit at the end of full gc. In some cases, with SoftMaxHeapSize (but also without), the later will undo the expansion at young gc, which will immediately start to expand again. *5 GCTimeRatio can't be adjusted during runtime, which means that you won't achieve that tight of a heap as in this example. GCTimeRatio is also a bit unwieldy to use, i.e since it is the denominator in the (default; nobody sets GCPauseIntervalMillis) time calculation, you get "good" granularity of low values, but pretty bad granularity of high values. *6 Min/MaxHeapFreeRatio default values are probably too high - with adaptive IHOP, G1 can typically meet its current goal very well, any excess is often just wasted committed memory. A similar issue to that is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent, i.e. the default reserve for the IHOP. In this case there will be significant memory commit/uncommit pauses. Here is my question to you (and any readers), are you using Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size seems to be much more direct and better than Min/MaxHeapFreeRatio. Given above (and assuming that there are no reasons to keep it), it may be useful to start deprecation process (at least for the use in G1) when SoftMaxHeapSize is in. There are some more issues with heap sizing not really relevant to this discussion, I need to think about them a bit more and file appropriately worded CRs. Either way, what do you think about my suggested change? Can you try it on your workloads to see if it could do the job? Any other comments? More work is needed on this patch I think; also we might need to think about how the user can detect this change of the target better in the logs for troubleshooting. The original patch (webrev.2) also contained some minor unrelated cleanups (one constification of a method, one rename of the heap resizing phase) that might be easier to address separately more quickly ;) Thanks, Thomas [0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty -Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000 -Dspecjbb.controller.preset.duration=10800000" VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication This gives ~1.5GB live set size, on my machine around 10-40ms pause time, so rather light load at least without setting any heap size goal; in my runs, G1 settles to around 3.8GB of committed heap. (with Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into the VM startup options too) [1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png [2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/ [3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png From sangheon.kim at oracle.com Wed Feb 5 22:52:49 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 5 Feb 2020 14:52:49 -0800 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> Message-ID: Hi Kim, On 1/31/20 2:25 PM, Kim Barrett wrote: >> On Jan 23, 2020, at 3:10 PM, Kim Barrett wrote: >> >>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >>> On 16.01.20 09:51, Kim Barrett wrote: >>>> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >>>> is one of the two remaining super-special "access" ranked mutexes. >>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >>>> by JDK-8221360.) >>>> There are three main parts to this change. >>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >>>> lock-free FIFO queue. >>>> (2) Replace the use of a HotSpot monitor for signaling activation of >>>> concurrent refinement threads with a semaphore-based solution. >>>> (3) Handle pausing of buffer refinement in the middle of a buffer in >>>> order to handle a pending safepoint request. This can no longer just >>>> push the partially processed buffer back onto the queue, due to ABA >>>> problems now that the buffer is lock-free. >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8237143 >>>> Webrev: >>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >>>> Testing: >>>> mach5 tier1-5 >>>> Normal performance testing showed no significant change. >>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >>>> improvement, though not statistically significant; removing contention >>>> for that lock by many hardware threads may be a little bit noticeable. >>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. >> After some offline discussion with Thomas, I?m doing some restructuring that >> makes it probably not very efficient for anyone else to do a careful review of >> the open.00 version. > Here's a new webrev: > > https://cr.openjdk.java.net/~kbarrett/8237143/open.02/ Webrev.02 looks really good. Thanks, Sangheon > > Testing: > mach5 tier1-5 > Performance testing showed no significant change. > > I didn't bother providing an incremental webrev, because the changes > to g1DirtyCardQueue.[ch]pp are pretty substantial. Those are the only > files changed, except for the suggested move of the comment for > G1ConcurrentRefineThread::maybe_deactivate and some related comment > improvements nearby. > > Most of this round of changes are refactoring within G1DirtyCardQueueSet, > mainly adding internal helper classes for the FIFO queue and for the paused > buffers, each with their own (commented) APIs. I think that has addressed a > lot of Thomas's comments about the comments, and I hope has made the code > easier to understand. > > I've also improved the mechanism for handling "paused" buffers, simplifying > it by making better use of some invariants. > >> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >> // The key idea to make this work is that pop (get_completed_buffer) >> // never returns an element of the queue if it is the only accessible >> // element, >> If I understand this correctly, maybe "if there is only one buffer in the FIFO" is easier to understand than "only accessible element". (or define "accessible element?). > I specifically don't want to say it that way because we could have a > situation like > > (1) Start with a queue having exactly one element. > > (2) Thread1 starts a push by updating tail, but has not yet linked the old > tail to the new. > > (3) Thread2 performs a push. > > The buffer pushed by Thread2 is "in the queue" by some reasonable > definition, so the queue contains two buffers. But that buffer is not yet > accessible, because Thread1 hasn't completed its push. The alternative is > to (in the description) somehow divorce a completed push from the notion of > the number of buffers in the queue, which seems worse to me. I expanded the > discussion a bit though, including what is meant by "accessible". > >> The code seems to unnecessarily use the NULL_buffer constant. Maybe use it here too. Overall I am not sure about the usefulness of using NULL_buffer in the code. The NULL value in Hotspot code is generally accepted as a special value, and the name "NULL_buffer" does not seem to add any information. > The point of NULL_buffer was to avoid casts of NULL in Atomic operations, > and I then used it consistently. But I've changed to using such casts, > since it turned out there weren't that many and we can get rid of those > uniformly here and elsewhere when we have C++11 nullptr and nullptr_t. > > > From maoliang.ml at alibaba-inc.com Thu Feb 6 12:27:09 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 06 Feb 2020 20:27:09 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>, <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> Message-ID: <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> Hi Thomas, Thanks for the testing and evaluating! I tried your test with specjbb2015 and had some little different result maybe because of machine capability. The config I used is as below: -Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4 -XX:+UseStringDeduplication -Dspecjbb.comm.connect.type=HTTP_Jetty -Dspecjbb.controller.type=PRESET -Dspecjbb.controller.preset.ir=5000 -Dspecjbb.controller.preset.duration=10800000 The heap was around 6GB after running for a while (300s). And I was able to use SoftMaxHeapSize to let it shrink to 5GB. It should be like your scenario to shrink the heap to 3GB. The behavior is as I expected. But I thought you might expect more aggressive result. In my mind, for a constant load, the jvm might not need to shrink the heap that JVM supposes to expand the heap to the right capacity. The soft limit I imagine is to bring the heap size down after a load pike. In Alibaba's workload, the heap shrink is controlled by cluster's unified control center which has the predicition data and the soft limit works more like a *hard* limit in our 8u implementation. So I think it is acceptable that heap size failed shrinked to 2GB in your test case. You can see that G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative and we may be able to make it more aggressive. For almost idle application which doesn't have a GC for a rather long time, the shrink cannot happen. In our previous 8u patch, we have a timer to trigger GC and the softmx is changed by a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option in 8u yet). Shall we introduce a timer GC as well? Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine the heap expand/shrink in G1 and in our 8u practical experience we never have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce your test, the only exception is the heap will expand to 6GB after shrinking to SoftMaxHeapSize=5g is because in remark we will resize the heap. BTW, I don't think remark is a good point to resize heap since in remark phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't need to resize in remark but just resize after mixed GC according to GCTimeRatio. Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling seems a similar approach as ZGC. ZGC is a single generation GC whose scenario is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision in G1. Since we already have policy to determine the shrink of the heap by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according to SoftMaxHeapSize... We may encounter the situation that we cannot shrink the heap size to SoftMaxHeapSize but concurrent mark become frequent after affecting the IHOP policy. > In the log I have, the problem seems to be that we are re-setting the > softmaxheapsize within the space reclamation phase (i.e. mixed gc) and > G1 sizing policies got confused, i.e. it partially keeps on using the 2g > goal for young gen sizing until the *2 problem expands it. That's a bug > and needs to be fixed. I don't think it's a problem that after mixed GC resize_heap_after_young_collection will evaluate if the heap can be shrinked to the new value of SoftMaxHeapSize. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 5 (Wed.) 16:14 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Liang, apologies for the late reply - I did look at the patch immediately after you posted it, but initial tests showed that it does not work as (I) expected. More about that below. So I went ahead and hack up something that comes closer to what I had in mind. Unfortunately other more urgent issues came up, which caused the delay on this work. Sorry. (And sorry for the long post). Not having any kind of workload to work with for testing the change I used some configuration of specjbb2015 with fixed ir [0] (taken from a colleague's unrelated recent internal test), simulating a constant load the user wants to control the heap usage of. In this situatio I want to apologize to use specjbb2015 for this public reply because it not openly available, but I only noticed when writing up this email. Finding a substitute and redoing measurements would probably take more time. I will start looking into this issue. Anyway, in my test scenario, after warmup, the user tries to first limit the heap to 2GB, and after a while to 3GB, and then back to 8GB. The resulting graph [1] shows heap metrics over time: blue ("soft") is the current SoftMaxHeapSize, pink ("committed") represents committed memory, yellow ("goal") shows G1's current heap size goal, turquoise ("free") the amount of free heap and purple ("used") the amount of used memory. Ignoring the drop from ~second 30-100 where I finally managed to set Min/MaxHeapFreeRatio ;) you can see that G1 kind of stabilizes at around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to 2GB. As you can see, G1 ignores the request. This corresponds to the code where apparently the heap is only reduced to SoftMaxHeapSize if there is enough free space to reduce to that value (I think). At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected drop in memory usage. However, since the change does not modify G1 goals it ultimately just ignores the SoftMaxHeapSize goal. It probably worked if there were no further application activity. I created a webrev of an alternative attempt that modifies G1's goal/target heap size in the adaptive IHOP mechanism so that G1 automatically starts marking so that a space reclamation phase starts before reaching softmaxheapsize. It basically changes the predictor's reserve according to current committed heap size not only based on G1ReservePercent, but also on the specified SoftMaxHeapSize. One complication in a generational setting is to adapt young gen (particularly survivor size) to that goal too, but I think the change does okay with that. However it is not finished yet, there is debugging code in it and one FIXME that is about shuffling around code properly. In the graph at [3] you can see the results, with same metrics shown. In this case G1 fairly well follows the soft goal. For the 2g softmaxheapsize goal it works perfectly in the example (*1), in the 3g softmaxheapsize change we get some initial short overshoot in committed memory. (*2/*3) There are however some problems/differences to your solution here which need to be discussed a bit more to see if it fits you and ultimately make it perform better: *0 this change uses existing sizing to uncommit memory, i.e. memory is not uncommitted immediately but part of regular operation. This means that the garbage collection cycle needs to advance. In case of specjbb with fixed IR this is no issue, but completely quiescent applications need other mechanisms like the "Promptly Return Unused Committed Memory (JEP 346) feature enabled. Some tuning is needed in that mechanism for almost-idle applications. *1 the problem with only setting SoftMaxHeapSize and relying on the regular uncommit mechanism is that due to other reasons, e.g. GCTimeRatio, G1 won't achieve this kind of compact heap. This is the reason why my setup includes the GCTimeRatio=4 on the command line - otherwise in neither case G1 would achieve the 2g goal (it would settle around 3g with my changes, didn't test the original changes; max heap usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify it during runtime (i.e. when you want to select a different throughput/latency tradeoff to achieve lower heap usage). *2 looking at the results more closely the (first) overshoot in the 3g soft max heap size goal, I think this is a remaining issue in the heap sizing policy in conjunction with soft max heap size, i.e. temporarily the target gctimeratio is set to 10% for various reasons. (in G1HeapSizingPolicy::expansion_amount()). In the log I have, the problem seems to be that we are re-setting the softmaxheapsize within the space reclamation phase (i.e. mixed gc) and G1 sizing policies got confused, i.e. it partially keeps on using the 2g goal for young gen sizing until the *2 problem expands it. That's a bug and needs to be fixed. So far previous text only looked at the best case where everything fits together; there are some other issues which will prevent you from achieving a tight heap in some cases that I noticed during my testing. Something to think about. *4 GCTimeRatio/heap expansion during young gc has different goals than the (un-)commit at the end of full gc. In some cases, with SoftMaxHeapSize (but also without), the later will undo the expansion at young gc, which will immediately start to expand again. *5 GCTimeRatio can't be adjusted during runtime, which means that you won't achieve that tight of a heap as in this example. GCTimeRatio is also a bit unwieldy to use, i.e since it is the denominator in the (default; nobody sets GCPauseIntervalMillis) time calculation, you get "good" granularity of low values, but pretty bad granularity of high values. *6 Min/MaxHeapFreeRatio default values are probably too high - with adaptive IHOP, G1 can typically meet its current goal very well, any excess is often just wasted committed memory. A similar issue to that is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent, i.e. the default reserve for the IHOP. In this case there will be significant memory commit/uncommit pauses. Here is my question to you (and any readers), are you using Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size seems to be much more direct and better than Min/MaxHeapFreeRatio. Given above (and assuming that there are no reasons to keep it), it may be useful to start deprecation process (at least for the use in G1) when SoftMaxHeapSize is in. There are some more issues with heap sizing not really relevant to this discussion, I need to think about them a bit more and file appropriately worded CRs. Either way, what do you think about my suggested change? Can you try it on your workloads to see if it could do the job? Any other comments? More work is needed on this patch I think; also we might need to think about how the user can detect this change of the target better in the logs for troubleshooting. The original patch (webrev.2) also contained some minor unrelated cleanups (one constification of a method, one rename of the heap resizing phase) that might be easier to address separately more quickly ;) Thanks, Thomas [0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty -Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000 -Dspecjbb.controller.preset.duration=10800000" VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication This gives ~1.5GB live set size, on my machine around 10-40ms pause time, so rather light load at least without setting any heap size goal; in my runs, G1 settles to around 3.8GB of committed heap. (with Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into the VM startup options too) [1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png [2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/ [3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png From zgu at redhat.com Thu Feb 6 17:34:10 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 6 Feb 2020 12:34:10 -0500 Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null check Message-ID: Please review this small change that adds a null check before calling keep alive barrier to avoid assertion failure. Native barrier may return null for a not null oop, if it is dead. Bug: https://bugs.openjdk.java.net/browse/JDK-8238574 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/ Test: hotspot_gc_shenandoah, vmTestbase_nsk_jdi where the problem was observed. Thanks, -Zhengyu From shade at redhat.com Thu Feb 6 17:42:44 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 6 Feb 2020 18:42:44 +0100 Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null check In-Reply-To: References: Message-ID: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com> On 2/6/20 6:34 PM, Zhengyu Gu wrote: > Please review this small change that adds a null check before calling > keep alive barrier to avoid assertion failure. > > Native barrier may return null for a not null oop, if it is dead. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8238574 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/ The patch looks good. But I have a broader question: are all other paths that use the returned value from LRB-native safe? E.g. calling from assembler/C1/C2? Thanks, -Aleksey From zgu at redhat.com Thu Feb 6 17:56:11 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 6 Feb 2020 12:56:11 -0500 Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null check In-Reply-To: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com> References: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com> Message-ID: On 2/6/20 12:42 PM, Aleksey Shipilev wrote: > On 2/6/20 6:34 PM, Zhengyu Gu wrote: >> Please review this small change that adds a null check before calling >> keep alive barrier to avoid assertion failure. >> >> Native barrier may return null for a not null oop, if it is dead. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/ > The patch looks good. > > But I have a broader question: are all other paths that use the returned value from LRB-native safe? > E.g. calling from assembler/C1/C2? In C1/C2, we just make runtime call to this implementation. -Zhengyu > > Thanks, > -Aleksey > From zgu at redhat.com Thu Feb 6 18:32:46 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 6 Feb 2020 13:32:46 -0500 Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null check In-Reply-To: References: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com> Message-ID: <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com> On 2/6/20 12:56 PM, Zhengyu Gu wrote: > > > On 2/6/20 12:42 PM, Aleksey Shipilev wrote: >> On 2/6/20 6:34 PM, Zhengyu Gu wrote: >>> Please review this small change that adds a null check before calling >>> keep alive barrier to avoid assertion failure. >>> >>> Native barrier may return null for a not null oop, if it is dead. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/ >> The patch looks good. >> >> But I have a broader question: are all other paths that use the >> returned value from LRB-native safe? >> E.g. calling from assembler/C1/C2? > > In C1/C2, we just make runtime call to this implementation. Sorry, jump the gun to fast. I don't think I answered your question :-( C1/C2's pre-val barriers seem to have null check. Roman, could you confirm? Thanks, -Zhengyu > > -Zhengyu > >> >> Thanks, >> -Aleksey >> From rkennke at redhat.com Thu Feb 6 19:39:08 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 6 Feb 2020 20:39:08 +0100 Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null check In-Reply-To: <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com> References: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com> <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com> Message-ID: <028fc92b-b660-354f-1c4b-4a78bae8319a@redhat.com> Hi folks, >>>> Please review this small change that adds a null check before calling >>>> keep alive barrier to avoid assertion failure. >>>> >>>> Native barrier may return null for a not null oop, if it is dead. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574 >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/ >>> The patch looks good. >>> >>> But I have a broader question: are all other paths that use the >>> returned value from LRB-native safe? >>> E.g. calling from assembler/C1/C2? >> >> In C1/C2, we just make runtime call to this implementation. > > Sorry, jump the gun to fast. I don't think I answered your question :-( > > C1/C2's pre-val barriers seem to have null check. Roman, could you confirm? Yes, I think this is correct. Thanks, Roman From kim.barrett at oracle.com Fri Feb 7 00:00:22 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 6 Feb 2020 19:00:22 -0500 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> Message-ID: > On Feb 4, 2020, at 9:39 AM, Thomas Schatzl wrote: > > Hi Kim, > > On 31.01.20 23:25, Kim Barrett wrote: >>> On Jan 23, 2020, at 3:10 PM, Kim Barrett wrote: >>> >>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >>>> On 16.01.20 09:51, Kim Barrett wrote: >>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >>>>> is one of the two remaining super-special "access" ranked mutexes. >>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >>>>> by JDK-8221360.) >>>>> There are three main parts to this change. >>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >>>>> lock-free FIFO queue. >>>>> (2) Replace the use of a HotSpot monitor for signaling activation of >>>>> concurrent refinement threads with a semaphore-based solution. >>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in >>>>> order to handle a pending safepoint request. This can no longer just >>>>> push the partially processed buffer back onto the queue, due to ABA >>>>> problems now that the buffer is lock-free. >>>>> CR: >>>>> https://bugs.openjdk.java.net/browse/JDK-8237143 >>>>> Webrev: >>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >>>>> Testing: >>>>> mach5 tier1-5 >>>>> Normal performance testing showed no significant change. >>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >>>>> improvement, though not statistically significant; removing contention >>>>> for that lock by many hardware threads may be a little bit noticeable. >>>> >>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. >>> >>> After some offline discussion with Thomas, I?m doing some restructuring that >>> makes it probably not very efficient for anyone else to do a careful review of >>> the open.00 version. >> Here's a new webrev: >> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/ > > I think this is good. Thanks for your extensive changes. Thanks. > Two minor issues. Do not need re-review: > > * s/unsufficient/insufficient in g1DirtyCardQueue.cpp Thanks for spotting that. > * simple predicates returning bool tend to have an "is_" or "has_" prepended to it, i.e. s/PausedBuffers::empty()/...::is_empty()/ Agreed; will change to is_empty. Old habits seem to die hard; I think someday we might want to be more consistent with the Standard Library, but not today. > > Thanks, > Thomas From kim.barrett at oracle.com Fri Feb 7 00:00:44 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 6 Feb 2020 19:00:44 -0500 Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon In-Reply-To: References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com> <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com> <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com> Message-ID: <0B25580C-189C-4B09-88A9-6D1FBCD97C08@oracle.com> > On Feb 5, 2020, at 5:52 PM, sangheon.kim at oracle.com wrote: > > Hi Kim, > > On 1/31/20 2:25 PM, Kim Barrett wrote: >>> On Jan 23, 2020, at 3:10 PM, Kim Barrett wrote: >>> >>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl wrote: >>>> On 16.01.20 09:51, Kim Barrett wrote: >>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon. This >>>>> is one of the two remaining super-special "access" ranked mutexes. >>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered >>>>> by JDK-8221360.) >>>>> There are three main parts to this change. >>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a >>>>> lock-free FIFO queue. >>>>> (2) Replace the use of a HotSpot monitor for signaling activation of >>>>> concurrent refinement threads with a semaphore-based solution. >>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in >>>>> order to handle a pending safepoint request. This can no longer just >>>>> push the partially processed buffer back onto the queue, due to ABA >>>>> problems now that the buffer is lock-free. >>>>> CR: >>>>> https://bugs.openjdk.java.net/browse/JDK-8237143 >>>>> Webrev: >>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/ >>>>> Testing: >>>>> mach5 tier1-5 >>>>> Normal performance testing showed no significant change. >>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS >>>>> improvement, though not statistically significant; removing contention >>>>> for that lock by many hardware threads may be a little bit noticeable. >>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again. >>> After some offline discussion with Thomas, I?m doing some restructuring that >>> makes it probably not very efficient for anyone else to do a careful review of >>> the open.00 version. >> Here's a new webrev: >> >> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/ > Webrev.02 looks really good. > > Thanks, > Sangheon Thanks. From maoliang.ml at alibaba-inc.com Fri Feb 7 05:39:28 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Fri, 07 Feb 2020 13:39:28 +0800 Subject: =?UTF-8?B?W1JhcmUgY2FzZV0gRzEgbWl4ZWQgR0MgZGlkbid0IHJlY2xhaW0gZ2FyYmFnZXMgaW4gOHU=?= Message-ID: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com> Hi All, I saw a rare case that G1 almost clear nothing in mixed GC but later full GC reclaimed 70% of the heap. The version is 8u and is there any bug or is it an extreme case of floating garbage because of SATB? The GC log is as below: 2020-02-06T20:07:39.785+0800: 6805.381: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0381100 secs] [Parallel Time: 32.5 ms, GC Workers: 8] [GC Worker Start (ms): Min: 6805383.4, Avg: 6805383.5, Max: 6805383.5, Diff: 0.1] [Ext Root Scanning (ms): Min: 7.4, Avg: 12.1, Max: 19.6, Diff: 12.2, Sum: 97.0] [Update RS (ms): Min: 2.8, Avg: 6.3, Max: 11.1, Diff: 8.3, Sum: 50.7] [Processed Buffers: Min: 48, Avg: 116.9, Max: 180, Diff: 132, Sum: 935] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 9.7, Avg: 13.6, Max: 17.6, Diff: 7.9, Sum: 109.1] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 32.1, Avg: 32.2, Max: 32.2, Diff: 0.1, Sum: 257.5] [GC Worker End (ms): Min: 6805415.6, Avg: 6805415.6, Max: 6805415.6, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 5.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.2G(14.2G)] [Times: user=0.22 sys=0.00, real=0.04 secs] 2020-02-06T20:07:39.825+0800: 6805.421: [GC concurrent-root-region-scan-start] 2020-02-06T20:07:39.826+0800: 6805.422: Total time for which application threads were stopped: 0.0512361 seconds, Stopping threads took: 0.0009971 seconds 2020-02-06T20:07:39.845+0800: 6805.441: [GC concurrent-root-region-scan-end, 0.0199532 secs] 2020-02-06T20:07:39.845+0800: 6805.441: [GC concurrent-mark-start] 2020-02-06T20:07:43.459+0800: 6809.055: [GC concurrent-mark-end, 3.6139728 secs] 2020-02-06T20:07:43.467+0800: 6809.063: [GC remark 2020-02-06T20:07:43.467+0800: 6809.063: [Finalize Marking, 0.0027913 secs] 2020-02-06T20:07:43.470+0800: 6809.066: [GC ref-proc, 0.0141510 secs] 2020-02-06T20:07:43.484+0800: 6809.080: [Unloading, 0.0562292 secs], 0.0987990 secs] [Times: user=0.60 sys=0.01, real=0.10 secs] 2020-02-06T20:07:43.568+0800: 6809.164: Total time for which application threads were stopped: 0.1087168 seconds, Stopping threads took: 0.0008774 seconds 2020-02-06T20:07:43.576+0800: 6809.172: [GC cleanup 13G->13G(14G), 0.0128258 secs] [Times: user=0.08 sys=0.01, real=0.01 secs] 2020-02-06T20:07:43.590+0800: 6809.186: Total time for which application threads were stopped: 0.0223063 seconds, Stopping threads took: 0.0005304 seconds 2020-02-06T20:07:45.145+0800: 6810.741: [GC pause (G1 Evacuation Pause) (young), 0.0299645 secs] [Parallel Time: 24.8 ms, GC Workers: 8] [GC Worker Start (ms): Min: 6810743.7, Avg: 6810744.1, Max: 6810746.8, Diff: 3.1] [Ext Root Scanning (ms): Min: 5.5, Avg: 9.0, Max: 15.3, Diff: 9.8, Sum: 72.2] [Update RS (ms): Min: 2.8, Avg: 6.8, Max: 9.3, Diff: 6.5, Sum: 54.2] [Processed Buffers: Min: 58, Avg: 120.5, Max: 175, Diff: 117, Sum: 964] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 6.4, Avg: 8.4, Max: 9.7, Diff: 3.3, Sum: 66.9] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3] [GC Worker Total (ms): Min: 21.5, Avg: 24.3, Max: 24.7, Diff: 3.1, Sum: 194.0] [GC Worker End (ms): Min: 6810768.3, Avg: 6810768.3, Max: 6810768.4, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 4.9 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.3G(14.2G)] [Times: user=0.17 sys=0.00, real=0.03 secs] 2020-02-06T20:07:45.178+0800: 6810.774: Total time for which application threads were stopped: 0.0420222 seconds, Stopping threads took: 0.0009371 seconds 2020-02-06T20:07:47.186+0800: 6812.782: Total time for which application threads were stopped: 0.0081580 seconds, Stopping threads took: 0.0009037 seconds 2020-02-06T20:07:51.031+0800: 6816.627: [GC pause (G1 Evacuation Pause) (mixed), 0.0327771 secs] [Parallel Time: 27.2 ms, GC Workers: 8] [GC Worker Start (ms): Min: 6816629.1, Avg: 6816629.2, Max: 6816629.2, Diff: 0.1] [Ext Root Scanning (ms): Min: 4.9, Avg: 7.6, Max: 15.4, Diff: 10.5, Sum: 60.8] [Update RS (ms): Min: 2.8, Avg: 6.3, Max: 9.1, Diff: 6.2, Sum: 50.8] [Processed Buffers: Min: 18, Avg: 124.9, Max: 224, Diff: 206, Sum: 999] [Scan RS (ms): Min: 0.5, Avg: 0.8, Max: 1.2, Diff: 0.6, Sum: 6.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 6.4, Avg: 12.4, Max: 17.4, Diff: 11.0, Sum: 99.0] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 27.1, Avg: 27.1, Max: 27.2, Diff: 0.1, Sum: 216.9] [GC Worker End (ms): Min: 6816656.3, Avg: 6816656.3, Max: 6816656.3, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.4 ms] [Other: 5.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.2 ms] [Ref Enq: 0.1 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.2G(14.2G)] [Times: user=0.21 sys=0.01, real=0.04 secs] 2020-02-06T20:07:51.066+0800: 6816.662: Total time for which application threads were stopped: 0.0449305 seconds, Stopping threads took: 0.0009496 seconds 2020-02-06T20:07:58.095+0800: 6823.691: [GC pause (G1 Evacuation Pause) (young), 0.0297937 secs] [Parallel Time: 24.3 ms, GC Workers: 8] [GC Worker Start (ms): Min: 6823693.1, Avg: 6823693.1, Max: 6823693.2, Diff: 0.1] [Ext Root Scanning (ms): Min: 4.9, Avg: 7.3, Max: 15.6, Diff: 10.7, Sum: 58.1] [Update RS (ms): Min: 5.6, Avg: 7.4, Max: 9.1, Diff: 3.5, Sum: 58.9] [Processed Buffers: Min: 30, Avg: 124.6, Max: 163, Diff: 133, Sum: 997] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 2.4, Avg: 9.5, Max: 13.4, Diff: 11.0, Sum: 76.2] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 24.2, Avg: 24.2, Max: 24.3, Diff: 0.1, Sum: 193.9] [GC Worker End (ms): Min: 6823717.4, Avg: 6823717.4, Max: 6823717.4, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.3 ms] [Other: 5.2 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.2 ms] [Ref Enq: 0.1 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.3G(14.2G)] 2020-02-06T20:08:18.256+0800: 6843.852: [Full GC (Allocation Failure) 14G->4027M(14G), 7.5914236 secs] [Eden: 0.0B(704.0M)->0.0B(4480.0M) Survivors: 0.0B->0.0B Heap: 14.0G(14.2G)->4027.5M(14.2G)], [Metaspace: 401632K->401608K(1411072K)] [Times: user=11.26 sys=0.18, real=7.59 secs] Thanks, Liang From thomas.schatzl at oracle.com Fri Feb 7 11:09:20 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 7 Feb 2020 12:09:20 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> Message-ID: <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> Hi, On 06.02.20 13:27, Liang Mao wrote: > Hi Thomas, > > Thanks for the testing and evaluating! > > I tried your test with specjbb2015 and had some little different > result maybe because of machine capability. The config I used is as below: > -Xmx8g?-Xms2g?-Xlog:gc*?-XX:GCTimeRatio=4 > -XX:+UseStringDeduplication > -Dspecjbb.comm.connect.type=HTTP_Jetty > -Dspecjbb.controller.type=PRESET > -Dspecjbb.controller.preset.ir=5000 > -Dspecjbb.controller.preset.duration=10800000 > > The heap was around 6GB after running for a while (300s). And > I was able to use SoftMaxHeapSize to let it shrink to 5GB. It > should be like your scenario to shrink the heap to 3GB > > The behavior is as I expected. But I thought you might expect > more aggressive result. In my mind, for a constant load, > the jvm might not need to shrink the heap that JVM supposes to expand > the heap to the right capacity. Did you change Min/MaxHeapFreeRatio for your test? It does not look like that, as I get roughly the same results if I don't. Given that we agree that it is wrong to use Min/MaxHeapFreeRatio during Remark, the observation is interesting, but does not seem to help here except reinforcing that Min/MaxHeapFreeRatio are not a good thing to use. Also, I doubt that G1's current heap size selection is optimal. Some reasons off my head: - Min/MaxHeapFreeRatio has been chosen to avoid uncommit/commit ping-pong and frequent (un-)commits (i.e. performance), not heap compactness. - adaptive IHOP (or at least the knowledge about expected amount of memory used during gc operation) has not been available, hence the very conservative values. - the values have been chosen long before the uncommit at remark [2] has been implemented. As author of that change I can authoratively say that fixing the policy had been out of scope for that change ;) however it had been needed for JEP 346 Promptly Uncommit unused memory [1] to do *something* without disrupting existing behavior too much to avoid lengthy re-evaluation of sizing policies. The logic went something like: what concurrent mark does roughly equals full gc, so do the same sizing as during full gc. End. - there is (rough) consensus that Min/MaxHeapFreeRatio is/has been a bad idea, starting from the naming. ZGC and Shenandoah do not use it afaict. - optimal heap size depends on application phase (e.g. startup/operation/idle). Min/MaxHeapFreeRatio default values basically prevent shrinking in many cases. Sometimes they even expand the heap [3]. Given the high default value of MinHeapFreeRatio, G1 will most likely end up using too much memory. I.e. we apply MinHeapFreeRatio at Remark, which means that the heap size will be kept at heap size at Remark + 40%. Given that Remark is where heap usage almost peaked anyway, you get a really large commit size. Unnecessarily large because (beginning with modestly large heaps in few GBs) the actual peak memory usage *at optimal operation* is what adaptive IHOP determined. This is typically a lot less than 40% of existing usage at Remark. So G1 keeps a lot of memory around for no reason. This can be particularly significant in large heaps (say, double digit GB) where those 40% can be a lot in absolute terms while G1 only ever uses single digit additional GB during the cycle. In my tests, e.g. the suggested 10% seem sufficient for that particular case. We also agree that uncommit at end of mixed gc is probably better, but again, how much do you uncommit? To keep as much as you expect to not use would be a good start, maybe a bit more. Not less, because then you are going to do an unnecessary commit during that cycle for sure. Currently the best idea about what we are going to need in the next time is given by the IHOP goal value imho. So overall, please do not read too much into existing heap sizing policy :) > The soft limit I imagine is > to bring the heap size down after a load pike. In Alibaba's > workload, the heap shrink is controlled by cluster's unified > control center which has the predicition data and the soft limit > works more like a *hard* limit in our 8u implementation. > > So I think it is acceptable that heap size failed shrinked > to 2GB in your test case. You can see that > G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative > and we may be able to make it more aggressive. > > > For almost idle application which doesn't have a GC for a > rather long time, the shrink cannot happen. In our previous 8u > patch, we have a timer to trigger GC and the softmx is changed by > a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option > in 8u yet). Shall we introduce a timer GC as well? > Please give the functionality JEP 346 added a try if you haven't. It should achieve what you suggest except that Min/MaxHeapFreeRatio may prevent G1 to achive the compact heap you expect (again). Min/MaxHeapFreeRatio were changed to be manageable exactly for this reason, i.e. if you are idle, and your control center knows that the machine is going to be idle, instead of adjusting (in this case) SoftMaxHeapSize it may as well set Min/MaxHeapFreeRatio to low values and JEP 346 would do the rest. Before JEP 346 you needed to send a manual system.gc in addition. So a simpler solution than the one suggested by you would be to just drop usage of Min/MaxHeapFreeRatio and/or incorporate SoftMaxHeapSize in the uncommit at remark in your case and let JEP 346 functionality its job. If JEP 346 does not work for your use case, we are eager to hear back from you about your experience. We do know that it may be a little bit too much focused on what "idle" is, but that can be tweaked. The reason I am suggesting to try JEP 346 is that from my understanding the suggested implementation seems to cover only exactly the same case as JEP 346, but only with side effects e.g. - causing commit/uncommit ping-pong if the application is slightly active at worst, and no effect at best. While concurrent uncommit tries to mitigate this (and it is still very interesting to do), doing less commit/uncommit in the first place seems better. - not covering e.g. the case where an existing Remark finishes after the last GC that decreased the heap to SoftMaxHeapSize even in the idle case (could be fixed as you mentioned above with a timer, but JEP 346 covers this already) - only limited to reducing heap to SoftMaxHeapSize (why? Fixed as you said you were thinking about a more aggressive policy) In a SoftMaxHeapSize solution in the JVM that I envision, the change should cover a wide(r) range of usage scenarios. We need to look a bit further than this single use case (which afaict G1 should already handle). In the case you need a real hard limit I recommend looking at implementing that. There has been a proposal to do so some time ago, but is inactive at this time [0]. > > Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine > the heap expand/shrink in G1 and in our 8u practical experience we never > have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce > your test, the only exception is the heap will expand to 6GB after > shrinking to SoftMaxHeapSize=5g is because in remark we will resize the > heap. > BTW, I don't think remark is a good point to resize heap since in remark > phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't > need to resize in remark but just resize after mixed GC according to > GCTimeRatio. > > Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling > seems a similar approach as ZGC. ZGC is a single generation GC whose > scenario > is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision > in G1. Since we already have policy to determine the shrink of the heap > by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according > to SoftMaxHeapSize... We may encounter the situation that we cannot > shrink the > heap size to SoftMaxHeapSize but concurrent mark become frequent after > affecting > the IHOP policy. ZGC will be generational at some point. This has been on its roadmap since the beginning. Also, there is not much difference as you can see from the patch. The difference is currently 1 LOC to set young gen sizes in addition to the heap goal. I also thought about the last point, i.e. when the user sets SoftMaxHeapSize too low, then you get continuous marking cycles. My answer to the user would be that, well, feel free to shoot yourselves into the foot, but compared to an OOME with a hard limit, this behavior seems much better (but there are certainly situations where a hard limit is better for someone so both seem useful). Ultimately the only thing I can say that there is no free lunch in the throughput/latency/memory triangle, but there may be situations where memory is more important than performance too (widening the appeal of SoftMaxHeapSize). In the test I gave, the 2g goal is maybe too low for this case, but the 3g (instead of 3.8g) looks really attractive (and G1 seems to find an "optimal" size of 2.2-2.8g at that point; I think I found the reason for the spikes above 3g and looking into testing a fix). The implementation suggested by me does not affect the idle case at all; JEP 346 functionality will clean up and compact the heap nicely (you would still need to fix the shrinking amount in the sizing policy, but we already agreed on that it is not good, and that doing the evaluation at remark isn't the best idea either - but both are separate issues). > >> In?the?log?I?have,?the?problem?seems?to?be?that?we?are?re-setting?the >> softmaxheapsize?within?the?space?reclamation?phase?(i.e.?mixed?gc)?and >> G1?sizing?policies?got?confused,?i.e.?it?partially?keeps?on?using?the?2g >> goal?for?young?gen?sizing?until?the?*2?problem?expands?it.?That's?a?bug >> and?needs?to?be?fixed. > > I don't think it's a problem that after mixed GC > resize_heap_after_young_collection > will evaluate if the heap can be shrinked to the new value of > SoftMaxHeapSize. Resizing (to SoftMaxHeapSize) after every gc will shrink and expand all the time unnecessarily. I.e. you expand one GC, the next gc it may happen that G1 can shrink to SoftMaxHeapSize again (e.g. because eager reclaim freed a lot), next gc G1 commits again because of failed pause time goal (or just commit during humongous allocation which can be immediately reversed because of eager reclaim). Even with concurrent uncommit, such behavior seems a waste of time. Imho with concurrent (un-)commit unnecessary resizing should be avoided if possible. One option is to base that decision on the value that adaptive IHOP gives you. It seems a very good start but there may be better approaches. Fixed percentages like Min/MaxFreeRatio are too simple as it seems :) Thanks, Thomas [0] https://bugs.openjdk.java.net/browse/JDK-8204088 [1] https://bugs.openjdk.java.net/browse/JDK-8204089 [2] https://bugs.openjdk.java.net/browse/JDK-6490394 [3] https://bugs.openjdk.java.net/browse/JDK-6490394?focusedCommentId=14283475&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14283475 (only just noticed) From thomas.schatzl at oracle.com Fri Feb 7 11:09:46 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 7 Feb 2020 12:09:46 +0100 Subject: [Rare case] G1 mixed GC didn't reclaim garbages in 8u In-Reply-To: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com> References: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com> Message-ID: <3cafc27b-67bd-5f52-aa7d-3638104871c8@oracle.com> Hi, On 07.02.20 06:39, Liang Mao wrote: > Hi All, > > I saw a rare case that G1 almost clear nothing in mixed GC but later full GC > reclaimed 70% of the heap. The version is 8u and is there any bug or is it > an extreme case of floating garbage because of SATB? hard to say. It may just be the application keeping data alive as you indicate. I am not aware of a particular jdk8 bug that keeps objects alive unnecessarily. G1LogLevel=finest would give answer to why the mixed phase stopped early. It would not give insight about what exactly kept the data alive though. Thanks, Thomas From maoliang.ml at alibaba-inc.com Mon Feb 10 11:47:06 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 10 Feb 2020 19:47:06 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>, <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> Message-ID: <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> Hi Thomas, In my testing, I didn't change the value of Min/MaxHeapFreeRatio. The heap had already shrinked to 5GB but in remark it expand to 6644M. The fault value of MinHeapFreeRatio is 40, so the minimal commit size after remark is the heap size * 1.67 (3979M * 1.67 = 6644M). 1.67 = 100/(100 - 40) [1031.322s][info][gc ] GC(741) Pause Young (Concurrent Start) (G1 Evacuation Pause) 4724M->4506M(5120M) 10.607ms [1031.322s][info][gc,cpu ] GC(741) User=0.42s Sys=0.00s Real=0.01s [1031.322s][info][gc ] GC(742) Concurrent Cycle [1031.322s][info][gc,marking ] GC(742) Concurrent Clear Claimed Marks [1031.322s][info][gc,marking ] GC(742) Concurrent Clear Claimed Marks 0.066ms [1031.322s][info][gc,marking ] GC(742) Concurrent Scan Root Regions [1031.322s][info][gc,stringdedup ] Concurrent String Deduplication (1031.322s) [1031.323s][info][gc,stringdedup ] Concurrent String Deduplication 14224.0B->0.0B(14224.0B) avg 51.1% (1031.322s, 1031.323s) 0.514ms [1031.326s][info][gc,marking ] GC(742) Concurrent Scan Root Regions 3.939ms [1031.326s][info][gc,marking ] GC(742) Concurrent Mark (1031.326s) [1031.326s][info][gc,marking ] GC(742) Concurrent Mark From Roots [1031.326s][info][gc,task ] GC(742) Using 16 workers of 16 for marking [1031.483s][info][gc,marking ] GC(742) Concurrent Mark From Roots 157.144ms [1031.483s][info][gc,marking ] GC(742) Concurrent Preclean [1031.484s][info][gc,marking ] GC(742) Concurrent Preclean 0.404ms [1031.484s][info][gc,marking ] GC(742) Concurrent Mark (1031.326s, 1031.484s) 157.587ms [1031.485s][info][gc,start ] GC(742) Pause Remark [1031.496s][info][gc ] GC(742) Pause Remark 4625M->3979M(6644M) 10.953ms [1031.496s][info][gc,cpu ] GC(742) User=0.22s Sys=0.04s Real=0.01s In our production environment, we never use JEP 346 mainly because of JDK version. So I cannot tell how if it would work. I agree the "idle" issue is not our main focus for now. Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent mark GC cycle can work well with JEP 346 and the resize logic in remark. I don't stick to shrink the heap in every GC. The capacity in resize_heap_if_necessary will be Max2(min_desire_capacity_by_MinHeapFreeRatio, Min2(soft_max_capacity(), max_desire_capacity_by_MaxHeapFreeRatio)) But both 2 approaches have the problem that default MinHeapFreeRatio is too large in remark comparing to full gc. As resize_heap_if_necessary will keep a minimal heap size as 1.667X of used heap size. After remark, the used size could be large that not only include those old regions with garbages but also the used young regions. ############################# void G1CollectedHeap::resize_heap_if_necessary() { ... const size_t capacity_after_gc = capacity(); const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes(); ############################# The used_after_gc is reasonable for full gc but it can contains young regions in remark. Do you think it should be changed like this? ############################# const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes() - young_regions_count() * HeapRegion::GrainWords; // young_regions_count is 0 after full GC ############################# Besides this, as you suggested, a lower MinHeapFreeRatio would be good. But arbitrarily setting a fixed number seems is not a good way that the small number may not meet pause time goal in later young GC. I tried to use following number in resize_heap_if_necessary: ############################## void G1CollectedHeap::resize_heap_if_necessary() { ... // We can now safely turn them into size_t's. size_t minimum_desired_capacity = (size_t) minimum_desired_capacity_d; size_t maximum_desired_capacity = (size_t) maximum_desired_capacity_d; if (!collector_state()->in_full_gc()) { minimum_desired_capacity = MIN2(minimum_desired_capacity, policy()->minimum_desired_bytes(used_after_gc)); } ....size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const { return _ihop_control->unrestrained_young_size() != 0 ? _ihop_control->unrestrained_young_size() : _young_list_max_length * HeapRegion::GrainBytes + _reserve_regions * HeapRegion::GrainBytes + used_bytes; } ############################# I made the minimum_desired_capacity small enough based on adaptive IHOP's _last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can keep the memory under 3GB. It's a rough example and I didn't predict the promotion bytes of next young gc yet. Do you think a proper value of minimum_desired_capacity in remark resize + G1AdaptiveIHOPControl::actual_target_threshold according to soft_max_capacity is enough? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 7 (Fri.) 19:09 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 06.02.20 13:27, Liang Mao wrote: > Hi Thomas, > > Thanks for the testing and evaluating! > > I tried your test with specjbb2015 and had some little different > result maybe because of machine capability. The config I used is as below: > -Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4 > -XX:+UseStringDeduplication > -Dspecjbb.comm.connect.type=HTTP_Jetty > -Dspecjbb.controller.type=PRESET > -Dspecjbb.controller.preset.ir=5000 > -Dspecjbb.controller.preset.duration=10800000 > > The heap was around 6GB after running for a while (300s). And > I was able to use SoftMaxHeapSize to let it shrink to 5GB. It > should be like your scenario to shrink the heap to 3GB > > The behavior is as I expected. But I thought you might expect > more aggressive result. In my mind, for a constant load, > the jvm might not need to shrink the heap that JVM supposes to expand > the heap to the right capacity. Did you change Min/MaxHeapFreeRatio for your test? It does not look like that, as I get roughly the same results if I don't. Given that we agree that it is wrong to use Min/MaxHeapFreeRatio during Remark, the observation is interesting, but does not seem to help here except reinforcing that Min/MaxHeapFreeRatio are not a good thing to use. Also, I doubt that G1's current heap size selection is optimal. Some reasons off my head: - Min/MaxHeapFreeRatio has been chosen to avoid uncommit/commit ping-pong and frequent (un-)commits (i.e. performance), not heap compactness. - adaptive IHOP (or at least the knowledge about expected amount of memory used during gc operation) has not been available, hence the very conservative values. - the values have been chosen long before the uncommit at remark [2] has been implemented. As author of that change I can authoratively say that fixing the policy had been out of scope for that change ;) however it had been needed for JEP 346 Promptly Uncommit unused memory [1] to do *something* without disrupting existing behavior too much to avoid lengthy re-evaluation of sizing policies. The logic went something like: what concurrent mark does roughly equals full gc, so do the same sizing as during full gc. End. - there is (rough) consensus that Min/MaxHeapFreeRatio is/has been a bad idea, starting from the naming. ZGC and Shenandoah do not use it afaict. - optimal heap size depends on application phase (e.g. startup/operation/idle). Min/MaxHeapFreeRatio default values basically prevent shrinking in many cases. Sometimes they even expand the heap [3]. Given the high default value of MinHeapFreeRatio, G1 will most likely end up using too much memory. I.e. we apply MinHeapFreeRatio at Remark, which means that the heap size will be kept at heap size at Remark + 40%. Given that Remark is where heap usage almost peaked anyway, you get a really large commit size. Unnecessarily large because (beginning with modestly large heaps in few GBs) the actual peak memory usage *at optimal operation* is what adaptive IHOP determined. This is typically a lot less than 40% of existing usage at Remark. So G1 keeps a lot of memory around for no reason. This can be particularly significant in large heaps (say, double digit GB) where those 40% can be a lot in absolute terms while G1 only ever uses single digit additional GB during the cycle. In my tests, e.g. the suggested 10% seem sufficient for that particular case. We also agree that uncommit at end of mixed gc is probably better, but again, how much do you uncommit? To keep as much as you expect to not use would be a good start, maybe a bit more. Not less, because then you are going to do an unnecessary commit during that cycle for sure. Currently the best idea about what we are going to need in the next time is given by the IHOP goal value imho. So overall, please do not read too much into existing heap sizing policy :) > The soft limit I imagine is > to bring the heap size down after a load pike. In Alibaba's > workload, the heap shrink is controlled by cluster's unified > control center which has the predicition data and the soft limit > works more like a *hard* limit in our 8u implementation. > > So I think it is acceptable that heap size failed shrinked > to 2GB in your test case. You can see that > G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative > and we may be able to make it more aggressive. > > > For almost idle application which doesn't have a GC for a > rather long time, the shrink cannot happen. In our previous 8u > patch, we have a timer to trigger GC and the softmx is changed by > a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option > in 8u yet). Shall we introduce a timer GC as well? > Please give the functionality JEP 346 added a try if you haven't. It should achieve what you suggest except that Min/MaxHeapFreeRatio may prevent G1 to achive the compact heap you expect (again). Min/MaxHeapFreeRatio were changed to be manageable exactly for this reason, i.e. if you are idle, and your control center knows that the machine is going to be idle, instead of adjusting (in this case) SoftMaxHeapSize it may as well set Min/MaxHeapFreeRatio to low values and JEP 346 would do the rest. Before JEP 346 you needed to send a manual system.gc in addition. So a simpler solution than the one suggested by you would be to just drop usage of Min/MaxHeapFreeRatio and/or incorporate SoftMaxHeapSize in the uncommit at remark in your case and let JEP 346 functionality its job. If JEP 346 does not work for your use case, we are eager to hear back from you about your experience. We do know that it may be a little bit too much focused on what "idle" is, but that can be tweaked. The reason I am suggesting to try JEP 346 is that from my understanding the suggested implementation seems to cover only exactly the same case as JEP 346, but only with side effects e.g. - causing commit/uncommit ping-pong if the application is slightly active at worst, and no effect at best. While concurrent uncommit tries to mitigate this (and it is still very interesting to do), doing less commit/uncommit in the first place seems better. - not covering e.g. the case where an existing Remark finishes after the last GC that decreased the heap to SoftMaxHeapSize even in the idle case (could be fixed as you mentioned above with a timer, but JEP 346 covers this already) - only limited to reducing heap to SoftMaxHeapSize (why? Fixed as you said you were thinking about a more aggressive policy) In a SoftMaxHeapSize solution in the JVM that I envision, the change should cover a wide(r) range of usage scenarios. We need to look a bit further than this single use case (which afaict G1 should already handle). In the case you need a real hard limit I recommend looking at implementing that. There has been a proposal to do so some time ago, but is inactive at this time [0]. > > Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine > the heap expand/shrink in G1 and in our 8u practical experience we never > have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce > your test, the only exception is the heap will expand to 6GB after > shrinking to SoftMaxHeapSize=5g is because in remark we will resize the > heap. > BTW, I don't think remark is a good point to resize heap since in remark > phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't > need to resize in remark but just resize after mixed GC according to > GCTimeRatio. > > Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling > seems a similar approach as ZGC. ZGC is a single generation GC whose > scenario > is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision > in G1. Since we already have policy to determine the shrink of the heap > by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according > to SoftMaxHeapSize... We may encounter the situation that we cannot > shrink the > heap size to SoftMaxHeapSize but concurrent mark become frequent after > affecting > the IHOP policy. ZGC will be generational at some point. This has been on its roadmap since the beginning. Also, there is not much difference as you can see from the patch. The difference is currently 1 LOC to set young gen sizes in addition to the heap goal. I also thought about the last point, i.e. when the user sets SoftMaxHeapSize too low, then you get continuous marking cycles. My answer to the user would be that, well, feel free to shoot yourselves into the foot, but compared to an OOME with a hard limit, this behavior seems much better (but there are certainly situations where a hard limit is better for someone so both seem useful). Ultimately the only thing I can say that there is no free lunch in the throughput/latency/memory triangle, but there may be situations where memory is more important than performance too (widening the appeal of SoftMaxHeapSize). In the test I gave, the 2g goal is maybe too low for this case, but the 3g (instead of 3.8g) looks really attractive (and G1 seems to find an "optimal" size of 2.2-2.8g at that point; I think I found the reason for the spikes above 3g and looking into testing a fix). The implementation suggested by me does not affect the idle case at all; JEP 346 functionality will clean up and compact the heap nicely (you would still need to fix the shrinking amount in the sizing policy, but we already agreed on that it is not good, and that doing the evaluation at remark isn't the best idea either - but both are separate issues). > >> In the log I have, the problem seems to be that we are re-setting the >> softmaxheapsize within the space reclamation phase (i.e. mixed gc) and >> G1 sizing policies got confused, i.e. it partially keeps on using the 2g >> goal for young gen sizing until the *2 problem expands it. That's a bug >> and needs to be fixed. > > I don't think it's a problem that after mixed GC > resize_heap_after_young_collection > will evaluate if the heap can be shrinked to the new value of > SoftMaxHeapSize. Resizing (to SoftMaxHeapSize) after every gc will shrink and expand all the time unnecessarily. I.e. you expand one GC, the next gc it may happen that G1 can shrink to SoftMaxHeapSize again (e.g. because eager reclaim freed a lot), next gc G1 commits again because of failed pause time goal (or just commit during humongous allocation which can be immediately reversed because of eager reclaim). Even with concurrent uncommit, such behavior seems a waste of time. Imho with concurrent (un-)commit unnecessary resizing should be avoided if possible. One option is to base that decision on the value that adaptive IHOP gives you. It seems a very good start but there may be better approaches. Fixed percentages like Min/MaxFreeRatio are too simple as it seems :) Thanks, Thomas [0] https://bugs.openjdk.java.net/browse/JDK-8204088 [1] https://bugs.openjdk.java.net/browse/JDK-8204089 [2] https://bugs.openjdk.java.net/browse/JDK-6490394 [3] https://bugs.openjdk.java.net/browse/JDK-6490394?focusedCommentId=14283475&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14283475 (only just noticed) From m.sundar85 at gmail.com Mon Feb 10 18:32:32 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Mon, 10 Feb 2020 13:32:32 -0500 Subject: Parallel GC Thread crash In-Reply-To: References: Message-ID: Hi Stefan, We started seeing more crashes on JDK13.0.1+9 Since seeing it on GC Task Thread assumed it is related to GC. # Problematic frame: # V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh reads=5 ... Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat Enterprise Linux Server release 6.10 (Santiago) Time: Fri Feb 7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m 30s) --------------- T H R E A D --------------- Current thread (0x00007fca6c074000): GCTaskThread "ParGC Thread#28" [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] Stack: [0x00007fba72ff1000,0x00007fba730f1000], sp=0x00007fba730ee850, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xd190be] ThreadRootsTask::do_it(GCTaskManager*, unsigned int)+0x6e V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf707fd] Thread::call_run()+0x10d V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 58520 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c] J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8] J 58224 c2 webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258] J 69992 c2 webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4] J 55265 c2 webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0] J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634] J 15811% c2 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4] j com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] J 7550 c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 [0x00007fca54fba8e0+0x0000000000000df4] J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c [0x00007fca5454b8c0+0x000000000000007c] J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] v ~StubRoutines::call_stub siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 Does JDK11 and 13 have different code for GC. Do you think downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? Any insight to debug this will be helpful. TIA Sundar On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson wrote: > Hi Sundar, > > The GC crashes when it encounters something bad on the stack: > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > This is probably not a GC bug. It's more likely that this is caused by > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > also get crashes in other compiler related areas. > > If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > running. > > StefanK > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > Hi, > > I am seeing following crashes frequently on our servers > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > parallel > > gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" > [stack: > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 225122 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8] > > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > J 225129 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac] > > J 131643 c2 > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] > > J 55114 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644] > > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > J 16114% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > J 7560 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > [0x00007fca15b23160+0x0000000000000df4] > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > [0x00007fca15b39a40+0x000000000000007c] > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Register to memory mapping: > > ... > > > > Can someone shed more info on when this can happen? I am seeing this on > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > There was another thread in hotspot runtime where David Holmes pointed > this > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > >> This seems it may be related to: > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > TIA > > Sundar > > > From sangheon.kim at oracle.com Mon Feb 10 18:59:24 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 10 Feb 2020 10:59:24 -0800 Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names In-Reply-To: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> Message-ID: Hi Thomas, On 1/30/20 3:08 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that moves some global > typedefs used only by Parallel GC from taskqueue.hpp to parallel gc > files, and further makes naming of instances of these more uniform? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238160 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238160/webrev/ Looks good to me. If you are interested, copyright year can be updated. I don't need a new webrev for this. Thanks, Sangheon > Testing: > local compilation > > Thanks, > ? Thomas From ecki at zusammenkunft.net Mon Feb 10 19:29:26 2020 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Mon, 10 Feb 2020 19:29:26 +0000 Subject: Parallel GC Thread crash In-Reply-To: References: , Message-ID: Hello, not an answer, but just a question, > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCThreads=5 what part of ParallelGC is controlled by concurrent threads setting? Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-dev im Auftrag von Sundara Mohan M Gesendet: Montag, Februar 10, 2020 7:33 PM An: Stefan Karlsson Cc: hotspot-gc-dev at openjdk.java.net Betreff: Re: Parallel GC Thread crash Hi Stefan, We started seeing more crashes on JDK13.0.1+9 Since seeing it on GC Task Thread assumed it is related to GC. # Problematic frame: # V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh reads=5 ... Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat Enterprise Linux Server release 6.10 (Santiago) Time: Fri Feb 7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m 30s) --------------- T H R E A D --------------- Current thread (0x00007fca6c074000): GCTaskThread "ParGC Thread#28" [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] Stack: [0x00007fba72ff1000,0x00007fba730f1000], sp=0x00007fba730ee850, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xd190be] ThreadRootsTask::do_it(GCTaskManager*, unsigned int)+0x6e V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf707fd] Thread::call_run()+0x10d V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 58520 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c] J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8] J 58224 c2 webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258] J 69992 c2 webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4] J 55265 c2 webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0] J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634] J 15811% c2 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4] j com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] J 7550 c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 [0x00007fca54fba8e0+0x0000000000000df4] J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c [0x00007fca5454b8c0+0x000000000000007c] J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] v ~StubRoutines::call_stub siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 Does JDK11 and 13 have different code for GC. Do you think downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? Any insight to debug this will be helpful. TIA Sundar On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson wrote: > Hi Sundar, > > The GC crashes when it encounters something bad on the stack: > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > This is probably not a GC bug. It's more likely that this is caused by > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > also get crashes in other compiler related areas. > > If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > running. > > StefanK > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > Hi, > > I am seeing following crashes frequently on our servers > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > parallel > > gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" > [stack: > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 225122 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8] > > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > J 225129 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac] > > J 131643 c2 > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] > > J 55114 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644] > > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > J 16114% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > J 7560 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > [0x00007fca15b23160+0x0000000000000df4] > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > [0x00007fca15b39a40+0x000000000000007c] > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Register to memory mapping: > > ... > > > > Can someone shed more info on when this can happen? I am seeing this on > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > There was another thread in hotspot runtime where David Holmes pointed > this > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > >> This seems it may be related to: > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > TIA > > Sundar > > > From stefan.karlsson at oracle.com Mon Feb 10 19:42:49 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 10 Feb 2020 20:42:49 +0100 Subject: Parallel GC Thread crash In-Reply-To: References: Message-ID: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> Hi Sundar, On 2020-02-10 19:32, Sundara Mohan M wrote: > Hi?Stefan, > ? ? We started seeing more crashes on JDK13.0.1+9 > > Since seeing it on GC Task Thread assumed it is related to GC. As I said in my previous mail, I don't think this is caused by GC code. More below. > > # Problematic frame: > # V ?[libjvm.so+0xd183c0] ?PSRootsClosure::do_oop(oopDesc**)+0x30 > > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh > reads=5 ... > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red > Hat Enterprise Linux Server release 6.10 (Santiago) > Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h > 31m 30s) > > --------------- ?T H R E A D ?--------------- > > Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC Thread#28" > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > > Stack: [0x00007fba72ff1000,0x00007fba730f1000], > ?sp=0x00007fba730ee850, ?free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V ?[libjvm.so+0xd183c0] ?PSRootsClosure::do_oop(oopDesc**)+0x30 > V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V ?[libjvm.so+0xd190be] ?ThreadRootsTask::do_it(GCTaskManager*, > unsigned int)+0x6e > V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb > V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d > V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ?~RuntimeStub::_new_array_Java > J 58520 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c] > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V > (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8] > J 58224 c2 > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258] > J 69992 c2 > webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4] > J 55265 c2 > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0] > J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z > (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634] > J 15811% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4] > j > ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] > J 7550 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > [0x00007fca54fba8e0+0x0000000000000df4] > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > [0x00007fca5454b8c0+0x000000000000007c] > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > v ?~StubRoutines::call_stub > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > > Does JDK11 and 13 have different code for GC. Do you think > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? You should at least move to 13.0.2, to get the latest bug fixes/patches. There has been a lot of changes in all areas of the JVM between 11 and 13. We don't yet know the root cause of this crash, and I can't say if this is caused by new changes or not. Have you or anyone filed a bug report for this? > Any insight to debug this will be helpful. Did you try my previous suggestion to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC? If you can tolerate the longer GC times it introduces, then you could try to run with -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC . Cheers, StefanK > > TIA > Sundar > > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > > wrote: > > Hi Sundar, > > The GC crashes when it encounters something bad on the stack: > ?> V? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*, > RegisterMap > ?> const*, OopClosure*)+0x2eb > ?> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > This is probably not a GC bug. It's more likely that this is > caused by > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > also get crashes in other compiler related areas. > > If you want to rule out the GC, you can run with > -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > running. > > StefanK > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > Hi, > >? ? ?I am seeing following crashes frequently on our servers > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build > 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, > tiered, parallel > > gc, linux-amd64) > > # Problematic frame: > > # V? [libjvm.so+0xcd3311] > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. > To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > > > ---------------? T H R E A D? --------------- > > > > Current thread (0x00007fca2c051000):? GCTaskThread "ParGC > Thread#8" [stack: > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > Stack: [0x00007fca30277000,0x00007fca30377000], > sp=0x00007fca30374890, > >? ?free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > V? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*, > RegisterMap > > const*, OopClosure*)+0x2eb > > V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V? [libjvm.so+0xf68b17]? JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V? [libjvm.so+0xcce2f0] > ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V? [libjvm.so+0x7f422b]? GCTaskThread::run()+0x1eb > > V? [libjvm.so+0xf707fd]? Thread::call_run()+0x10d > > V? [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v? ~RuntimeStub::_new_array_Java > > J 225122 c2 > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fca21f1a5d8 > [0x00007fca21f17f20+0x00000000000026b8] > > J 62342 c2 > webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > J 225129 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007fca1da512ac > [0x00007fca1da51100+0x00000000000001ac] > > J 131643 c2 > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007fca20ce6190 > [0x00007fca20ce60c0+0x00000000000000d0] > > J 55114 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007fca2051fe64 > [0x00007fca2051f820+0x0000000000000644] > > J 57859 c2 > webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > J 16114% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007fca1ced465c > [0x00007fca1ced4200+0x000000000000045c] > > j > > > ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 11639 c2 java.util.concurrent.FutureTask.run()V > java.base at 13.0.1 (123 > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > J 7560 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > [0x00007fca15b23160+0x0000000000000df4] > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > [0x00007fca15b39a40+0x000000000000007c] > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > v? ~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Register to memory mapping: > > ... > > > > Can someone shed more info on when this can happen? I am seeing > this on > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > There was another thread in hotspot runtime where David Holmes > pointed this > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > >> This seems it may be related to: > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > TIA > > Sundar > > > From m.sundar85 at gmail.com Mon Feb 10 19:44:35 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Mon, 10 Feb 2020 14:44:35 -0500 Subject: Parallel GC Thread crash In-Reply-To: References: Message-ID: I believe it is not used in case of Parallel GC. We were experimenting with ZGC using these numbers and it is still there. Thanks Sundar On Mon, Feb 10, 2020 at 2:36 PM Bernd Eckenfels wrote: > Hello, > > not an answer, but just a question, > > > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCThreads=5 > > what part of ParallelGC is controlled by concurrent threads setting? > > Gruss > Bernd > -- > http://bernd.eckenfels.net > ________________________________ > Von: hotspot-gc-dev im Auftrag > von Sundara Mohan M > Gesendet: Montag, Februar 10, 2020 7:33 PM > An: Stefan Karlsson > Cc: hotspot-gc-dev at openjdk.java.net > Betreff: Re: Parallel GC Thread crash > > Hi Stefan, > We started seeing more crashes on JDK13.0.1+9 > > Since seeing it on GC Task Thread assumed it is related to GC. > > # Problematic frame: > # V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 > > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m > -XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC > -XX:ParallelGCThreads=40 -XX:ConcGCTh > reads=5 ... > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Fri Feb 7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m > 30s) > > --------------- T H R E A D --------------- > > Current thread (0x00007fca6c074000): GCTaskThread "ParGC Thread#28" > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > > Stack: [0x00007fba72ff1000,0x00007fba730f1000], sp=0x00007fba730ee850, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xd190be] ThreadRootsTask::do_it(GCTaskManager*, unsigned > int)+0x6e > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 58520 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c] > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8] > J 58224 c2 > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258] > J 69992 c2 > > webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4] > J 55265 c2 > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0] > J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634] > J 15811% c2 > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4] > j > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] > J 7550 c1 > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > [0x00007fca54fba8e0+0x0000000000000df4] > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > [0x00007fca5454b8c0+0x000000000000007c] > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > v ~StubRoutines::call_stub > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > > Does JDK11 and 13 have different code for GC. Do you think > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? > > Any insight to debug this will be helpful. > > TIA > Sundar > > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > > wrote: > > > Hi Sundar, > > > > The GC crashes when it encounters something bad on the stack: > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > > const*, OopClosure*)+0x2eb > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > > This is probably not a GC bug. It's more likely that this is caused by > > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > > also get crashes in other compiler related areas. > > > > If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and > > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > > running. > > > > StefanK > > > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > > Hi, > > > I am seeing following crashes frequently on our servers > > > # > > > # A fatal error has been detected by the Java Runtime Environment: > > > # > > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > > # > > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > > parallel > > > gc, linux-amd64) > > > # Problematic frame: > > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > # > > > # No core dump will be written. Core dumps have been disabled. To > enable > > > core dumping, try "ulimit -c unlimited" before starting Java again > > > # > > > # If you would like to submit a bug report, please visit: > > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > > # > > > > > > > > > --------------- T H R E A D --------------- > > > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC Thread#8" > > [stack: > > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > > > Stack: [0x00007fca30277000,0x00007fca30377000], sp=0x00007fca30374890, > > > free space=1014k > > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > > j=interpreted, Vv=VM code, C=native code) > > > V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > > const*, OopClosure*)+0x2eb > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > > CodeBlobClosure*)+0x187 > > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > > unsigned int)+0xb0 > > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > > v ~RuntimeStub::_new_array_Java > > > J 225122 c2 > > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > (207 bytes) @ 0x00007fca21f1a5d8 > [0x00007fca21f17f20+0x00000000000026b8] > > > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V > (1004 > > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > > J 225129 c2 > > > > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (105 bytes) @ 0x00007fca1da512ac > [0x00007fca1da51100+0x00000000000001ac] > > > J 131643 c2 > > > > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0] > > > J 55114 c2 > > > > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > (332 bytes) @ 0x00007fca2051fe64 > [0x00007fca2051f820+0x0000000000000644] > > > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z > (272 > > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > > J 16114% c2 > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > (486 bytes) @ 0x00007fca1ced465c > [0x00007fca1ced4200+0x000000000000045c] > > > j > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 > (123 > > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > > J 7560 c1 > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > > [0x00007fca15b23160+0x0000000000000df4] > > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > > [0x00007fca15b39a40+0x000000000000007c] > > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > > v ~StubRoutines::call_stub > > > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > > 0x0000000000000000 > > > > > > Register to memory mapping: > > > ... > > > > > > Can someone shed more info on when this can happen? I am seeing this on > > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > > > There was another thread in hotspot runtime where David Holmes pointed > > this > > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > > 0x0000000000000000 > > > > > >> This seems it may be related to: > > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > > > > > TIA > > > Sundar > > > > > > From m.sundar85 at gmail.com Mon Feb 10 19:53:51 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Mon, 10 Feb 2020 14:53:51 -0500 Subject: Parallel GC Thread crash In-Reply-To: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> References: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> Message-ID: Hi Stefan, Yes we are trying to move to 13.0.2. Wanted to verify if anyone else seen this or upgrading will really solve this problem. Can you share how to file a bug report for this? I don't have access to https://bugs.openjdk.java.net/ I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to get more information. Thanks Sundar On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson wrote: > Hi Sundar, > > On 2020-02-10 19:32, Sundara Mohan M wrote: > > Hi Stefan, > > We started seeing more crashes on JDK13.0.1+9 > > > > Since seeing it on GC Task Thread assumed it is related to GC. > > As I said in my previous mail, I don't think this is caused by GC code. > More below. > > > > > # Problematic frame: > > # V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 > > > > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m > > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc > > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh > > reads=5 ... > > > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red > > Hat Enterprise Linux Server release 6.10 (Santiago) > > Time: Fri Feb 7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h > > 31m 30s) > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007fca6c074000): GCTaskThread "ParGC Thread#28" > > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > > > > Stack: [0x00007fba72ff1000,0x00007fba730f1000], > > sp=0x00007fba730ee850, free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V [libjvm.so+0xd183c0] PSRootsClosure::do_oop(oopDesc**)+0x30 > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xd190be] ThreadRootsTask::do_it(GCTaskManager*, > > unsigned int)+0x6e > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 58520 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c] > > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V > > (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8] > > J 58224 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258] > > J 69992 c2 > > > webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4] > > J 55265 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0] > > J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z > > (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634] > > J 15811% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] > > J 7550 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > > [0x00007fca54fba8e0+0x0000000000000df4] > > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > > [0x00007fca5454b8c0+0x000000000000007c] > > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Does JDK11 and 13 have different code for GC. Do you think > > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? > > You should at least move to 13.0.2, to get the latest bug fixes/patches. > > There has been a lot of changes in all areas of the JVM between 11 and > 13. We don't yet know the root cause of this crash, and I can't say if > this is caused by new changes or not. Have you or anyone filed a bug > report for this? > > > Any insight to debug this will be helpful. > > Did you try my previous suggestion to run with -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC? If you can tolerate the longer GC times it > introduces, then you could try to run with > -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC . > > Cheers, > StefanK > > > > > TIA > > Sundar > > > > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > > > wrote: > > > > Hi Sundar, > > > > The GC crashes when it encounters something bad on the stack: > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > > RegisterMap > > > const*, OopClosure*)+0x2eb > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > > This is probably not a GC bug. It's more likely that this is > > caused by > > the JIT compiler. I see in your hotspot-runtime-dev thread, that you > > also get crashes in other compiler related areas. > > > > If you want to rule out the GC, you can run with > > -XX:+VerifyBeforeGC and > > -XX:+VerifyAfterGC, and see if this asserts before the GC has started > > running. > > > > StefanK > > > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > > Hi, > > > I am seeing following crashes frequently on our servers > > > # > > > # A fatal error has been detected by the Java Runtime Environment: > > > # > > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299 > > > # > > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build > > 13.0.1+9) > > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, > > tiered, parallel > > > gc, linux-amd64) > > > # Problematic frame: > > > # V [libjvm.so+0xcd3311] > > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > # > > > # No core dump will be written. Core dumps have been disabled. > > To enable > > > core dumping, try "ulimit -c unlimited" before starting Java again > > > # > > > # If you would like to submit a bug report, please visit: > > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > > # > > > > > > > > > --------------- T H R E A D --------------- > > > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC > > Thread#8" [stack: > > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > > > Stack: [0x00007fca30277000,0x00007fca30377000], > > sp=0x00007fca30374890, > > > free space=1014k > > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > > j=interpreted, Vv=VM code, C=native code) > > > V [libjvm.so+0xcd3311] > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > > RegisterMap > > > const*, OopClosure*)+0x2eb > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > > CodeBlobClosure*)+0x187 > > > V [libjvm.so+0xcce2f0] > > ThreadRootsMarkingTask::do_it(GCTaskManager*, > > > unsigned int)+0xb0 > > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed > > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > > v ~RuntimeStub::_new_array_Java > > > J 225122 c2 > > > > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > (207 bytes) @ 0x00007fca21f1a5d8 > > [0x00007fca21f17f20+0x00000000000026b8] > > > J 62342 c2 > > webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88] > > > J 225129 c2 > > > > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (105 bytes) @ 0x00007fca1da512ac > > [0x00007fca1da51100+0x00000000000001ac] > > > J 131643 c2 > > > > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (9 bytes) @ 0x00007fca20ce6190 > > [0x00007fca20ce60c0+0x00000000000000d0] > > > J 55114 c2 > > > > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > (332 bytes) @ 0x00007fca2051fe64 > > [0x00007fca2051f820+0x0000000000000644] > > > J 57859 c2 > > webservice.filters.ResponseSerializationWorker.execute()Z (272 > > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8] > > > J 16114% c2 > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > (486 bytes) @ 0x00007fca1ced465c > > [0x00007fca1ced4200+0x000000000000045c] > > > j > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > > J 11639 c2 java.util.concurrent.FutureTask.run()V > > java.base at 13.0.1 (123 > > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098] > > > J 7560 c1 > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > > [0x00007fca15b23160+0x0000000000000df4] > > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > > [0x00007fca15b39a40+0x000000000000007c] > > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > > v ~StubRoutines::call_stub > > > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > > 0x0000000000000000 > > > > > > Register to memory mapping: > > > ... > > > > > > Can someone shed more info on when this can happen? I am seeing > > this on > > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > > > There was another thread in hotspot runtime where David Holmes > > pointed this > > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), > si_addr: > > > 0x0000000000000000 > > > > > >> This seems it may be related to: > > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > > > Just wondering if this is same or something to do with GC specific. > > > > > > > > > > > > TIA > > > Sundar > > > > > > > From kim.barrett at oracle.com Mon Feb 10 19:59:56 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 10 Feb 2020 14:59:56 -0500 Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names In-Reply-To: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> Message-ID: <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com> > On Jan 30, 2020, at 6:08 AM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this small change that moves some global typedefs used only by Parallel GC from taskqueue.hpp to parallel gc files, and further makes naming of instances of these more uniform? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238160 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238160/webrev/ > Testing: > local compilation > > Thanks, > Thomas The various "guarantee" checks that operator new didn't return NULL are a waste of time and space; CHeapObj's operator new exits rather than returning NULL. They are culterally compatible with other nearby code though; cleanup later? Looks good as is. From stefan.karlsson at oracle.com Mon Feb 10 20:13:06 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 10 Feb 2020 21:13:06 +0100 Subject: Parallel GC Thread crash In-Reply-To: References: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> Message-ID: <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com> On 2020-02-10 20:53, Sundara Mohan M wrote: > Hi Stefan, > ? ? Yes we are trying to move to 13.0.2. Wanted to verify if anyone > else seen this or upgrading will really solve this?problem. > > Can you share how to file a bug report for this? I don't have access > to https://bugs.openjdk.java.net/ There are directions in the hs_err crash file that points you to the web page to file a bug. You seem to be running AdoptJDK builds so your bug reports would end up at their system: >? ? ?> # If you would like to submit a bug report, please visit: >? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues If you were running with Oracle binaries you would get lines like this: # If you would like to submit a bug report, please visit: #?? https://bugreport.java.com/bugreport/crash.jsp > > I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to > get more information. OK. Hopefully this gives us more information. StefanK > > > Thanks > Sundar > > On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson > > wrote: > > Hi Sundar, > > On 2020-02-10 19:32, Sundara Mohan M wrote: > > Hi?Stefan, > > ? ? We started seeing more crashes on JDK13.0.1+9 > > > > Since seeing it on GC Task Thread assumed it is related to GC. > > As I said in my previous mail, I don't think this is caused by GC > code. > More below. > > > > > # Problematic frame: > > # V ?[libjvm.so+0xd183c0] > ?PSRootsClosure::do_oop(oopDesc**)+0x30 > > > > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m > > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc > > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh > > reads=5 ... > > > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, > Red > > Hat Enterprise Linux Server release 6.10 (Santiago) > > Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290 seconds > (3d 7h > > 31m 30s) > > > > --------------- ?T H R E A D ?--------------- > > > > Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC > Thread#28" > > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > > > > Stack: [0x00007fba72ff1000,0x00007fba730f1000], > > ?sp=0x00007fba730ee850, ?free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V ?[libjvm.so+0xd183c0] > ?PSRootsClosure::do_oop(oopDesc**)+0x30 > > V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*, > RegisterMap > > const*, OopClosure*)+0x2eb > > V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V ?[libjvm.so+0xd190be] ?ThreadRootsTask::do_it(GCTaskManager*, > > unsigned int)+0x6e > > V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb > > V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d > > V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ?~RuntimeStub::_new_array_Java > > J 58520 c2 > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > (207 bytes) @ 0x00007fca5fd23dec > [0x00007fca5fd1dbc0+0x000000000000622c] > > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V > > (1004 bytes) @ 0x00007fca60c02588 > [0x00007fca60bffce0+0x00000000000028a8] > > J 58224 c2 > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (105 bytes) @ 0x00007fca5f59bad8 > [0x00007fca5f59b880+0x0000000000000258] > > J 69992 c2 > > > webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > (9 bytes) @ 0x00007fca5e1019f4 > [0x00007fca5e101940+0x00000000000000b4] > > J 55265 c2 > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > (332 bytes) @ 0x00007fca5f6f58e0 > [0x00007fca5f6f5700+0x00000000000001e0] > > J 483122 c2 > webservice.filters.ResponseSerializationWorker.execute()Z > > (272 bytes) @ 0x00007fca622fc2b4 > [0x00007fca622fbc80+0x0000000000000634] > > J 15811% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > (486 bytes) @ 0x00007fca5c108794 > [0x00007fca5c1082a0+0x00000000000004f4] > > j > > > ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 4586 c1 java.util.concurrent.FutureTask.run()V > java.base at 13.0.1 (123 > > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] > > J 7550 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > > [0x00007fca54fba8e0+0x0000000000000df4] > > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > > [0x00007fca5454b8c0+0x000000000000007c] > > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > > v ?~StubRoutines::call_stub > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > > > Does JDK11 and 13 have different code for GC. Do you think > > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? > > You should at least move to 13.0.2, to get the latest bug > fixes/patches. > > There has been a lot of changes in all areas of the JVM between 11 > and > 13. We don't yet know the root cause of this crash, and I can't > say if > this is caused by new changes or not. Have you or anyone filed a bug > report for this? > > > Any insight to debug this will be helpful. > > Did you try my previous suggestion to run with -XX:+VerifyBeforeGC > and > -XX:+VerifyAfterGC? If you can tolerate the longer GC times it > introduces, then you could try to run with > -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC > -XX:+VerifyAfterGC . > > Cheers, > StefanK > > > > > TIA > > Sundar > > > > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > > > >> wrote: > > > >? ? ?Hi Sundar, > > > >? ? ?The GC crashes when it encounters something bad on the stack: > >? ? ??> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > >? ? ?RegisterMap > >? ? ??> const*, OopClosure*)+0x2eb > >? ? ??> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > >? ? ?This is probably not a GC bug. It's more likely that this is > >? ? ?caused by > >? ? ?the JIT compiler. I see in your hotspot-runtime-dev thread, > that you > >? ? ?also get crashes in other compiler related areas. > > > >? ? ?If you want to rule out the GC, you can run with > >? ? ?-XX:+VerifyBeforeGC and > >? ? ?-XX:+VerifyAfterGC, and see if this asserts before the GC > has started > >? ? ?running. > > > >? ? ?StefanK > > > >? ? ?On 2020-02-04 04:38, Sundara Mohan M wrote: > >? ? ?> Hi, > >? ? ?>? ? ?I am seeing following crashes frequently on our servers > >? ? ?> # > >? ? ?> # A fatal error has been detected by the Java Runtime > Environment: > >? ? ?> # > >? ? ?> #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, > tid=108299 > >? ? ?> # > >? ? ?> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build > >? ? ?13.0.1+9) > >? ? ?> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, > >? ? ?tiered, parallel > >? ? ?> gc, linux-amd64) > >? ? ?> # Problematic frame: > >? ? ?> # V? [libjvm.so+0xcd3311] > >? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > >? ? ?> # > >? ? ?> # No core dump will be written. Core dumps have been disabled. > >? ? ?To enable > >? ? ?> core dumping, try "ulimit -c unlimited" before starting > Java again > >? ? ?> # > >? ? ?> # If you would like to submit a bug report, please visit: > >? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues > >? ? ?> # > >? ? ?> > >? ? ?> > >? ? ?> ---------------? T H R E A D? --------------- > >? ? ?> > >? ? ?> Current thread (0x00007fca2c051000): GCTaskThread "ParGC > >? ? ?Thread#8" [stack: > >? ? ?> 0x00007fca30277000,0x00007fca30377000] [id=108299] > >? ? ?> > >? ? ?> Stack: [0x00007fca30277000,0x00007fca30377000], > >? ? ?sp=0x00007fca30374890, > >? ? ?>? ?free space=1014k > >? ? ?> Native frames: (J=compiled Java code, A=aot compiled Java > code, > >? ? ?> j=interpreted, Vv=VM code, C=native code) > >? ? ?> V? [libjvm.so+0xcd3311] > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > >? ? ?> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > >? ? ?RegisterMap > >? ? ?> const*, OopClosure*)+0x2eb > >? ? ?> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > >? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99 > >? ? ?> V? [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > >? ? ?> CodeBlobClosure*)+0x187 > >? ? ?> V? [libjvm.so+0xcce2f0] > >? ? ?ThreadRootsMarkingTask::do_it(GCTaskManager*, > >? ? ?> unsigned int)+0xb0 > >? ? ?> V? [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > >? ? ?> V? [libjvm.so+0xf707fd] Thread::call_run()+0x10d > >? ? ?> V? [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > >? ? ?> > >? ? ?> JavaThread 0x00007fb85c004800 (nid = 111387) was being > processed > >? ? ?> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > >? ? ?> v? ~RuntimeStub::_new_array_Java > >? ? ?> J 225122 c2 > >? ? ?> > > > ?ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > >? ? ?> (207 bytes) @ 0x00007fca21f1a5d8 > >? ? ?[0x00007fca21f17f20+0x00000000000026b8] > >? ? ?> J 62342 c2 > > ?webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > >? ? ?> bytes) @ 0x00007fca20f0aec8 > [0x00007fca20f07f40+0x0000000000002f88] > >? ? ?> J 225129 c2 > >? ? ?> > > > ?webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > >? ? ?> (105 bytes) @ 0x00007fca1da512ac > >? ? ?[0x00007fca1da51100+0x00000000000001ac] > >? ? ?> J 131643 c2 > >? ? ?> > > > ?webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > >? ? ?> (9 bytes) @ 0x00007fca20ce6190 > >? ? ?[0x00007fca20ce60c0+0x00000000000000d0] > >? ? ?> J 55114 c2 > >? ? ?> > > > ?webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > >? ? ?> (332 bytes) @ 0x00007fca2051fe64 > >? ? ?[0x00007fca2051f820+0x0000000000000644] > >? ? ?> J 57859 c2 > > ?webservice.filters.ResponseSerializationWorker.execute()Z (272 > >? ? ?> bytes) @ 0x00007fca1ef2ed18 > [0x00007fca1ef2e140+0x0000000000000bd8] > >? ? ?> J 16114% c2 > >? ? ?> > > > ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > >? ? ?> (486 bytes) @ 0x00007fca1ced465c > >? ? ?[0x00007fca1ced4200+0x000000000000045c] > >? ? ?> j > >? ? ?> > > > ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > >? ? ?> J 11639 c2 java.util.concurrent.FutureTask.run()V > >? ? ?java.base at 13.0.1 (123 > >? ? ?> bytes) @ 0x00007fca1cd00858 > [0x00007fca1cd007c0+0x0000000000000098] > >? ? ?> J 7560 c1 > >? ? ?> > > > ?java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > >? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > >? ? ?> [0x00007fca15b23160+0x0000000000000df4] > >? ? ?> J 5143 c1 > java.util.concurrent.ThreadPoolExecutor$Worker.run()V > >? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > >? ? ?> [0x00007fca15b39a40+0x000000000000007c] > >? ? ?> J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 > bytes) @ > >? ? ?> 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > >? ? ?> v? ~StubRoutines::call_stub > >? ? ?> > >? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), > si_addr: > >? ? ?> 0x0000000000000000 > >? ? ?> > >? ? ?> Register to memory mapping: > >? ? ?> ... > >? ? ?> > >? ? ?> Can someone shed more info on when this can happen? I am > seeing > >? ? ?this on > >? ? ?> multiple servers with Java 13.0.1+9 on RHEL6 servers. > >? ? ?> > >? ? ?> There was another thread in hotspot runtime where David Holmes > >? ? ?pointed this > >? ? ?>> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 > (SI_KERNEL), si_addr: > >? ? ?> 0x0000000000000000 > >? ? ?> > >? ? ?>> This seems it may be related to: > >? ? ?>> https://bugs.openjdk.java.net/browse/JDK-8004124 > >? ? ?> > >? ? ?> Just wondering if this is same or something to do with GC > specific. > >? ? ?> > >? ? ?> > >? ? ?> > >? ? ?> TIA > >? ? ?> Sundar > >? ? ?> > > > From ivan.walulya at oracle.com Tue Feb 11 07:34:19 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 11 Feb 2020 08:34:19 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging Message-ID: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> Hi all, Please review a small modification to turn parallel gc develop tracing flags into unified logging Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ Testing: Tier 1 - Tier 3 //Ivan From stefan.johansson at oracle.com Tue Feb 11 10:26:32 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 11 Feb 2020 11:26:32 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> Message-ID: <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> H Ivan, > 11 feb. 2020 kl. 08:34 skrev Ivan Walulya : > > Hi all, > > Please review a small modification to turn parallel gc develop tracing flags into unified logging > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 > Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ > When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here: 1616 #ifdef ASSERT 1617 log_develop_debug(gc, marking)( 1618 "add_obj_count=" SIZE_FORMAT " " 1619 "add_obj_bytes=" SIZE_FORMAT, 1620 add_obj_count, 1621 add_obj_size * HeapWordSize); 1622 log_develop_debug(gc, marking)( 1623 "mark_bitmap_count=" SIZE_FORMAT " " 1624 "mark_bitmap_bytes=" SIZE_FORMAT, 1625 mark_bitmap_count, 1626 mark_bitmap_size * HeapWordSize); 1627 #endif // #ifdef ASSERT Otherwise a very nice cleanup. Thanks, Stefan > Testing: Tier 1 - Tier 3 > > //Ivan From thomas.schatzl at oracle.com Tue Feb 11 10:42:34 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 11:42:34 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> Message-ID: Hi, On 10.02.20 12:47, Liang Mao wrote: > Hi Thomas, > > In my testing, I didn't change the value of?Min/MaxHeapFreeRatio. > > The heap had already shrinked to 5GB but in remark it expand to?6644M. > The fault value of MinHeapFreeRatio is 40, so the minimal commit size > after remark is the heap size * 1.67 (3979M * 1.67 = 6644M). > 1.67 = 100/(100 - 40) > > > [1031.322s][info][gc > ]?GC(741)?Pause?Young?(Concurrent?Start)?(G1?Evacuation?Pause)?4724M->4506M(5120M)?10.607ms > [1031.322s][info][gc,cpu?????????]?GC(741)?User=0.42s?Sys=0.00s?Real=0.01s > [1031.322s][info][gc?????????????]?GC(742)?Concurrent?Cycle > [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Clear?Claimed?Marks > [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Clear?Claimed?Marks?0.066ms > [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Scan?Root?Regions > [1031.322s][info][gc,stringdedup?]?Concurrent?String?Deduplication?(1031.322s) > [1031.323s][info][gc,stringdedup?]?Concurrent?String?Deduplication?14224.0B->0.0B(14224.0B)?avg?51.1%?(1031.322s,?1031.323s)?0.514ms > [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Scan?Root?Regions?3.939ms > [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Mark?(1031.326s) > [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Mark?From?Roots > [1031.326s][info][gc,task????????]?GC(742)?Using?16?workers?of?16?for?marking > [1031.483s][info][gc,marking?????]?GC(742)?Concurrent?Mark?From?Roots?157.144ms > [1031.483s][info][gc,marking?????]?GC(742)?Concurrent?Preclean > [1031.484s][info][gc,marking?????]?GC(742)?Concurrent?Preclean?0.404ms > [1031.484s][info][gc,marking?????]?GC(742)?Concurrent?Mark?(1031.326s,?1031.484s)?157.587ms > [1031.485s][info][gc,start???????]?GC(742)?Pause?Remark > [1031.496s][info][gc?????????????]?GC(742)?Pause?Remark?4625M->3979M(6644M)?10.953ms > [1031.496s][info][gc,cpu?????????]?GC(742)?User=0.22s?Sys=0.04s?Real=0.01s > > > In our production environment, we never use JEP 346 mainly because of > JDK version. > So I cannot tell how if it would work. I agree the "idle" issue is not > our main focus for now. > > Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent > mark GC cycle can work well with JEP 346 and the resize logic in remark. > I don't stick to shrink the heap in every GC. > > The capacity in resize_heap_if_necessary will be > Max2(min_desire_capacity_by_MinHeapFreeRatio, Min2(soft_max_capacity(), > max_desire_capacity_by_MaxHeapFreeRatio)) > > But both 2 approaches have the problem that default MinHeapFreeRatio is > too large > in remark comparing to full gc.? As resize_heap_if_necessary > will keep a minimal heap size as 1.667X of used heap size. After remark, > the used size could be large that not only include those old regions > with garbages but > also the used young regions. > > ############################# > void?G1CollectedHeap::resize_heap_if_necessary()?{ > ... > const?size_t?capacity_after_gc?=?capacity(); > const?size_t?used_after_gc?=?capacity_after_gc?-?unused_committed_regions_in_bytes(); > ############################# > > The used_after_gc is reasonable for full gc but it can contains young > regions in remark. > Do you think it should be changed like this? > ############################# > const?size_t?used_after_gc?=?capacity_after_gc?-?unused_committed_regions_in_bytes() > - young_regions_count() * HeapRegion::GrainWords; > // young_regions_count is 0 after full GC > ############################# Apart from naming ("used_after_gc") which has been wrong since that method has been in use for Remark, this seems reasonable. Maybe "old_used_after_gc"? I think the comments need changes to reflect that we apply the Min/MaxHeapFreeRatio on the old gen occupancy now (which is the same as total occupancy after full gc) because it may be called with young regions active. I also think the whole code that calculates the expansion and shrinking amount should be moved to the policy (and g1collectedheap code just calling that and then only react on the return value), but that can be done separately. > > Besides this, as you suggested, a lower MinHeapFreeRatio would be good. > But?arbitrarily setting a fixed number seems is not a good way that the > small number may not meet pause time goal in later young GC. I tried to use > following number in resize_heap_if_necessary: > > ############################## > void?G1CollectedHeap::resize_heap_if_necessary()?{ > ... > //?We?can?now?safely?turn?them?into?size_t's. > ??size_t?minimum_desired_capacity?=?(size_t)?minimum_desired_capacity_d; > ??size_t?maximum_desired_capacity?=?(size_t)?maximum_desired_capacity_d; > > if?(!collector_state()->in_full_gc())?{ > ????minimum_desired_capacity?=?MIN2(minimum_desired_capacity,?policy()->minimum_desired_bytes(used_after_gc)); > ??} That looks a bit hacky... :) But I do not have a better policy for sizing after full gc either. Did you try always using the minimum_desired_bytes()? > > .... > size_t?G1Policy::minimum_desired_bytes(size_t?used_bytes)?const?{ > ??return?_ihop_control->unrestrained_young_size()?!=?0?? > ???????????_ihop_control->unrestrained_young_size()?: > ???????????_young_list_max_length?*?HeapRegion::GrainBytes > ?????????+?_reserve_regions?*?HeapRegion::GrainBytes?+?used_bytes; > } I think G1IHOPControl::_target_occupancy (add a getter) is what you want to use here (untested). > ############################# > > I made the minimum_desired_capacity small enough based on adaptive IHOP's > _last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can > keep the memory under 3GB. It's a rough example and I didn't predict the > promotion bytes of next young gc yet. Do you think > a proper value of minimum_desired_capacity in remark resize > + > G1AdaptiveIHOPControl::actual_target_threshold according to > soft_max_capacity> is enough? Yes, both fixing the resizing logic and changing the IHOP target (and young gen size) according to SoftMaxHeapSize should be sufficient to let G1 keep that goal without too many commit activity. The resizing logic change could be handled under JDK-8238686, although this change does not modify the use of MaxHeapFreeRatio. There is a cleaned up version of my earlier change that implements the latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ . I will test your suggested changes and see its impact on our perf suite. Thanks a lot, Thomas P.S: it would be nice to send diffs of suggested changes for easier application too. From thomas.schatzl at oracle.com Tue Feb 11 10:43:57 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 11:43:57 +0100 Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names In-Reply-To: <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com> References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com> <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com> Message-ID: Hi Sangheon, Kim, On 10.02.20 20:59, Kim Barrett wrote: >> On Jan 30, 2020, at 6:08 AM, Thomas Schatzl wrote: >> >> Hi all, >> >> can I have reviews for this small change that moves some global typedefs used only by Parallel GC from taskqueue.hpp to parallel gc files, and further makes naming of instances of these more uniform? >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238160 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238160/webrev/ >> Testing: >> local compilation >> >> Thanks, >> Thomas > > The various "guarantee" checks that operator new didn't return NULL > are a waste of time and space; CHeapObj's operator new exits rather > than returning NULL. They are culterally compatible with other nearby > code though; cleanup later? > > Looks good as is. > thanks for your reviews. I filed JDK-8238854 for looking through the new exits - currently a prototype is currently running through testing. Thanks, Thomas From ivan.walulya at oracle.com Tue Feb 11 10:47:03 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 11 Feb 2020 11:47:03 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> Message-ID: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> Thanks Stefan, find below patch with the suggested updates. http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ //Ivan > On 11 Feb 2020, at 11:26, Stefan Johansson wrote: > > H Ivan, > >> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya : >> >> Hi all, >> >> Please review a small modification to turn parallel gc develop tracing flags into unified logging >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 >> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ >> > When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here: > 1616 #ifdef ASSERT > 1617 log_develop_debug(gc, marking)( > 1618 "add_obj_count=" SIZE_FORMAT " " > 1619 "add_obj_bytes=" SIZE_FORMAT, > 1620 add_obj_count, > 1621 add_obj_size * HeapWordSize); > 1622 log_develop_debug(gc, marking)( > 1623 "mark_bitmap_count=" SIZE_FORMAT " " > 1624 "mark_bitmap_bytes=" SIZE_FORMAT, > 1625 mark_bitmap_count, > 1626 mark_bitmap_size * HeapWordSize); > 1627 #endif // #ifdef ASSERT > > Otherwise a very nice cleanup. > > Thanks, > Stefan > >> Testing: Tier 1 - Tier 3 >> >> //Ivan > From maoliang.ml at alibaba-inc.com Tue Feb 11 11:46:21 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 11 Feb 2020 19:46:21 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>, Message-ID: Hi Thomas, > >> .... >> size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const { >> return _ihop_control->unrestrained_young_size() != 0 ? >> _ihop_control->unrestrained_young_size() : >> _young_list_max_length * HeapRegion::GrainBytes >> + _reserve_regions * HeapRegion::GrainBytes + used_bytes; >> } > I think G1IHOPControl::_target_occupancy (add a getter) is what you want > to use here (untested). I'm not looking for _target_occupancy which is current heap capacity because the minimum bytes may exceed it. Since the memory usage is almost at peak in remark, old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes can be the minimum desired bytes. > There is a cleaned up version of my earlier change that implements the > latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ . I have a question that heap size can be shrinked even commit size is not changed so it could cause a waste of committed free regions. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 11 (Tue.) 18:42 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 10.02.20 12:47, Liang Mao wrote: > Hi Thomas, > > In my testing, I didn't change the value of Min/MaxHeapFreeRatio. > > The heap had already shrinked to 5GB but in remark it expand to 6644M. > The fault value of MinHeapFreeRatio is 40, so the minimal commit size > after remark is the heap size * 1.67 (3979M * 1.67 = 6644M). > 1.67 = 100/(100 - 40) > > > [1031.322s][info][gc > ] GC(741) Pause Young (Concurrent Start) (G1 Evacuation Pause) 4724M->4506M(5120M) 10.607ms > [1031.322s][info][gc,cpu ] GC(741) User=0.42s Sys=0.00s Real=0.01s > [1031.322s][info][gc ] GC(742) Concurrent Cycle > [1031.322s][info][gc,marking ] GC(742) Concurrent Clear Claimed Marks > [1031.322s][info][gc,marking ] GC(742) Concurrent Clear Claimed Marks 0.066ms > [1031.322s][info][gc,marking ] GC(742) Concurrent Scan Root Regions > [1031.322s][info][gc,stringdedup ] Concurrent String Deduplication (1031.322s) > [1031.323s][info][gc,stringdedup ] Concurrent String Deduplication 14224.0B->0.0B(14224.0B) avg 51.1% (1031.322s, 1031.323s) 0.514ms > [1031.326s][info][gc,marking ] GC(742) Concurrent Scan Root Regions 3.939ms > [1031.326s][info][gc,marking ] GC(742) Concurrent Mark (1031.326s) > [1031.326s][info][gc,marking ] GC(742) Concurrent Mark From Roots > [1031.326s][info][gc,task ] GC(742) Using 16 workers of 16 for marking > [1031.483s][info][gc,marking ] GC(742) Concurrent Mark From Roots 157.144ms > [1031.483s][info][gc,marking ] GC(742) Concurrent Preclean > [1031.484s][info][gc,marking ] GC(742) Concurrent Preclean 0.404ms > [1031.484s][info][gc,marking ] GC(742) Concurrent Mark (1031.326s, 1031.484s) 157.587ms > [1031.485s][info][gc,start ] GC(742) Pause Remark > [1031.496s][info][gc ] GC(742) Pause Remark 4625M->3979M(6644M) 10.953ms > [1031.496s][info][gc,cpu ] GC(742) User=0.22s Sys=0.04s Real=0.01s > > > In our production environment, we never use JEP 346 mainly because of > JDK version. > So I cannot tell how if it would work. I agree the "idle" issue is not > our main focus for now. > > Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent > mark GC cycle can work well with JEP 346 and the resize logic in remark. > I don't stick to shrink the heap in every GC. > > The capacity in resize_heap_if_necessary will be > Max2(min_desire_capacity_by_MinHeapFreeRatio, Min2(soft_max_capacity(), > max_desire_capacity_by_MaxHeapFreeRatio)) > > But both 2 approaches have the problem that default MinHeapFreeRatio is > too large > in remark comparing to full gc. As resize_heap_if_necessary > will keep a minimal heap size as 1.667X of used heap size. After remark, > the used size could be large that not only include those old regions > with garbages but > also the used young regions. > > ############################# > void G1CollectedHeap::resize_heap_if_necessary() { > ... > const size_t capacity_after_gc = capacity(); > const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes(); > ############################# > > The used_after_gc is reasonable for full gc but it can contains young > regions in remark. > Do you think it should be changed like this? > ############################# > const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes() > - young_regions_count() * HeapRegion::GrainWords; > // young_regions_count is 0 after full GC > ############################# Apart from naming ("used_after_gc") which has been wrong since that method has been in use for Remark, this seems reasonable. Maybe "old_used_after_gc"? I think the comments need changes to reflect that we apply the Min/MaxHeapFreeRatio on the old gen occupancy now (which is the same as total occupancy after full gc) because it may be called with young regions active. I also think the whole code that calculates the expansion and shrinking amount should be moved to the policy (and g1collectedheap code just calling that and then only react on the return value), but that can be done separately. > > Besides this, as you suggested, a lower MinHeapFreeRatio would be good. > But arbitrarily setting a fixed number seems is not a good way that the > small number may not meet pause time goal in later young GC. I tried to use > following number in resize_heap_if_necessary: > > ############################## > void G1CollectedHeap::resize_heap_if_necessary() { > ... > // We can now safely turn them into size_t's. > size_t minimum_desired_capacity = (size_t) minimum_desired_capacity_d; > size_t maximum_desired_capacity = (size_t) maximum_desired_capacity_d; > > if (!collector_state()->in_full_gc()) { > minimum_desired_capacity = MIN2(minimum_desired_capacity, policy()->minimum_desired_bytes(used_after_gc)); > } That looks a bit hacky... :) But I do not have a better policy for sizing after full gc either. Did you try always using the minimum_desired_bytes()? > > .... > size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const { > return _ihop_control->unrestrained_young_size() != 0 ? > _ihop_control->unrestrained_young_size() : > _young_list_max_length * HeapRegion::GrainBytes > + _reserve_regions * HeapRegion::GrainBytes + used_bytes; > } I think G1IHOPControl::_target_occupancy (add a getter) is what you want to use here (untested). > ############################# > > I made the minimum_desired_capacity small enough based on adaptive IHOP's > _last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can > keep the memory under 3GB. It's a rough example and I didn't predict the > promotion bytes of next young gc yet. Do you think > a proper value of minimum_desired_capacity in remark resize > + > G1AdaptiveIHOPControl::actual_target_threshold according to > soft_max_capacity> is enough? Yes, both fixing the resizing logic and changing the IHOP target (and young gen size) according to SoftMaxHeapSize should be sufficient to let G1 keep that goal without too many commit activity. The resizing logic change could be handled under JDK-8238686, although this change does not modify the use of MaxHeapFreeRatio. There is a cleaned up version of my earlier change that implements the latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ . I will test your suggested changes and see its impact on our perf suite. Thanks a lot, Thomas P.S: it would be nice to send diffs of suggested changes for easier application too. From thomas.schatzl at oracle.com Tue Feb 11 12:08:02 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 13:08:02 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> Message-ID: <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com> Hi, On 11.02.20 12:46, Liang Mao wrote: > Hi Thomas, > > >> >>>?.... >>>?size_t?G1Policy::minimum_desired_bytes(size_t?used_bytes)?const?{ >>>????return?_ihop_control->unrestrained_young_size()?!=?0?? >>>?????????????_ihop_control->unrestrained_young_size()?: >>>?????????????_young_list_max_length?*?HeapRegion::GrainBytes >>>???????????+?_reserve_regions?*?HeapRegion::GrainBytes?+?used_bytes; >>>?} > >> I?think?G1IHOPControl::_target_occupancy?(add?a?getter)?is?what?you?want >> to?use?here?(untested). > > I'm not looking for _target_occupancy which is current heap capacity > because the minimum bytes may exceed it. Since the memory > usage is almost at peak in remark, > old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes > can be?the minimum desired bytes. You are right, I need to think about this some more. I think the calculation assumes that the next gc is the first mixed gc, which isn't true, there's that "Prepare Mixed" GC too. But as an initial approximation it should work. > > >> There?is?a?cleaned?up?version?of?my?earlier?change?that?implements?the >> latter?at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/?. > > I have a question that heap size can be shrinked even commit size is not > changed so it could cause a waste of committed free regions. You mean because of regions being larger than the commit size or other reasons? I.e. you have 2M large pages but only 1M regions, so you may end up with G1 not being able to actually uncommit as only half of that 2M page is free? Thanks, Thomas From thomas.schatzl at oracle.com Tue Feb 11 12:28:28 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 13:28:28 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> Message-ID: Hi, On 11.02.20 11:47, Ivan Walulya wrote: > Thanks Stefan, find below patch with the suggested updates. > > http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ > > http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ lgtm. Thanks, Thomas From maoliang.ml at alibaba-inc.com Tue Feb 11 12:52:25 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 11 Feb 2020 20:52:25 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com> References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> , <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com> Message-ID: Hi Thomas, > I think the calculation assumes that the next gc is the first mixed gc, > which isn't true, there's that "Prepare Mixed" GC too. But as an initial > approximation it should work. I assumed the prepare mixed GC. But technically we need the promotion bytes in 1st mixed GC too, right? After I took a look at gc log, i found there could be several normal young GC between remark and "Prepare Mixed" GC because it cost time to do some cleanup. So do you think resize in "Pause Cleanup" is a better way? > You mean because of regions being larger than the commit size or other > reasons? I.e. you have 2M large pages but only 1M regions, so you may > end up with G1 not being able to actually uncommit as only half of that > 2M page is free? No. Sorry for my unclear discription. My point is update_heap_target_size can happen in every normal GC but in remark real shrink may never happen (the large MaxHeapFreeRatio will prevent the shrinking). So we may use smaller heap size but no regions are uncommitted. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 11 (Tue.) 20:08 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 11.02.20 12:46, Liang Mao wrote: > Hi Thomas, > > >> >>> .... >>> size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const { >>> return _ihop_control->unrestrained_young_size() != 0 ? >>> _ihop_control->unrestrained_young_size() : >>> _young_list_max_length * HeapRegion::GrainBytes >>> + _reserve_regions * HeapRegion::GrainBytes + used_bytes; >>> } > >> I think G1IHOPControl::_target_occupancy (add a getter) is what you want >> to use here (untested). > > I'm not looking for _target_occupancy which is current heap capacity > because the minimum bytes may exceed it. Since the memory > usage is almost at peak in remark, > old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes > can be the minimum desired bytes. You are right, I need to think about this some more. I think the calculation assumes that the next gc is the first mixed gc, which isn't true, there's that "Prepare Mixed" GC too. But as an initial approximation it should work. > > >> There is a cleaned up version of my earlier change that implements the >> latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ . > > I have a question that heap size can be shrinked even commit size is not > changed so it could cause a waste of committed free regions. You mean because of regions being larger than the commit size or other reasons? I.e. you have 2M large pages but only 1M regions, so you may end up with G1 not being able to actually uncommit as only half of that 2M page is free? Thanks, Thomas From ivan.walulya at oracle.com Tue Feb 11 13:10:23 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 11 Feb 2020 14:10:23 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> Message-ID: <3840FA83-22C6-4BA9-A7D3-7F3027653BD5@oracle.com> Thanks Thomas! > On 11 Feb 2020, at 13:28, Thomas Schatzl wrote: > > Hi, > > On 11.02.20 11:47, Ivan Walulya wrote: >> Thanks Stefan, find below patch with the suggested updates. >> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ >> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ > > lgtm. > > Thanks, > Thomas From thomas.schatzl at oracle.com Tue Feb 11 13:27:36 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 14:27:36 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com> <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com> <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com> <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com> <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com> <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com> Message-ID: <8bf05fec-3f74-f070-28f7-1c61335a3715@oracle.com> Hi, On 11.02.20 13:52, Liang Mao wrote: > Hi Thomas, > >> I?think?the?calculation?assumes?that?the?next?gc?is?the?first?mixed?gc, >> which?isn't?true,?there's?that?"Prepare?Mixed"?GC?too.?But?as?an?initial >> approximation?it?should?work. > > I assumed the prepare mixed GC. But technically we need the promotion > bytes in 1st mixed GC too, right? After I took a look at gc > log, i found there could be several normal young GC between remark > and "Prepare Mixed" GC because it cost time to do some cleanup. Actually, building the remembered sets. > So do you think resize in "Pause?Cleanup" is a better way? I am certainly not opposed to moving resizing to the cleanup pause or anywhere else (last mixed gc?) where it makes most sense. Moving to Cleanup would likely make prediction about the "needed" memory easier. >> You?mean?because?of?regions?being?larger?than?the?commit?size?or?other >> reasons??I.e.?you?have?2M?large?pages?but?only?1M?regions,?so?you?may >> end?up?with?G1?not?being?able?to?actually?uncommit?as?only?half?of?that >> 2M?page?is?free? > > No. Sorry for my unclear discription. My point is update_heap_target_size > can happen in every normal GC but in remark real shrink may never happen > (the large MaxHeapFreeRatio will prevent the shrinking). > So we may use smaller heap size but no regions are uncommitted. that is true, that is the same issue I wanted to point out with my earlier remark about "The resizing logic change could be handled under JDK-8238686, although this change does not modify the use of MaxHeapFreeRatio. " - i.e. does not uncommit space due to MaxHeapFreeRatio. That may be handled separately if it is easier; JDK-8238686 suggests to think about removing the use of Min/MaxHeapFreeRatio alltogether (or maybe use only during full gc). > > Thanks, > Liang > Hth, Thomas From thomas.schatzl at oracle.com Tue Feb 11 13:30:09 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Feb 2020 14:30:09 +0100 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks Message-ID: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> Hi all, can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM. CR: https://bugs.openjdk.java.net/browse/JDK-8238854 Webrev: http://cr.openjdk.java.net/~tschatzl/8238854/webrev/ Testing: hs-tier1-5 without differences Thanks, Thomas From rkennke at redhat.com Tue Feb 11 15:38:38 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 11 Feb 2020 16:38:38 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type Message-ID: In ShBSC1::ensure_in_register() we are blindly creating registers of type T_OBJECT, even though in some cases we actually need T_ADDRESS. This blows up when we verify oop registers: when the argument is of type T_OBJECT we perform extra checks that fail when the value in register is not actually an object. Bug: https://bugs.openjdk.java.net/browse/JDK-8238851 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ Testing: the provided testcase passes now. hotspot_gc_shenandoah Ok? Thanks, Roman From shade at redhat.com Tue Feb 11 16:18:48 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 11 Feb 2020 17:18:48 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type In-Reply-To: References: Message-ID: On 2/11/20 4:38 PM, Roman Kennke wrote: > In ShBSC1::ensure_in_register() we are blindly creating registers of > type T_OBJECT, even though in some cases we actually need T_ADDRESS. > This blows up when we verify oop registers: when the argument is of type > T_OBJECT we perform extra checks that fail when the value in register is > not actually an object. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8238851 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ > Testing: the provided testcase passes now. hotspot_gc_shenandoah This path probably needs adjustment too: 167 #ifdef AARCH64 168 // AArch64 expects double-size register. 169 obj_reg = gen->new_pointer_register(); 170 #else -- Thanks, -Aleksey From rkennke at redhat.com Tue Feb 11 19:09:58 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 11 Feb 2020 20:09:58 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type In-Reply-To: References: Message-ID: >> In ShBSC1::ensure_in_register() we are blindly creating registers of >> type T_OBJECT, even though in some cases we actually need T_ADDRESS. >> This blows up when we verify oop registers: when the argument is of type >> T_OBJECT we perform extra checks that fail when the value in register is >> not actually an object. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8238851 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ >> Testing: the provided testcase passes now. hotspot_gc_shenandoah > > This path probably needs adjustment too: > > 167 #ifdef AARCH64 > 168 // AArch64 expects double-size register. > 169 obj_reg = gen->new_pointer_register(); > 170 #else The provided test passes on aarch64 without any additional changes. I tried removing the block, hoping that the suggested change does perhaps make it unnecessary, but no. It's still needed. Further suggestions welcome. This whole thing kinda smells. Roman From kim.barrett at oracle.com Wed Feb 12 00:45:33 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 11 Feb 2020 19:45:33 -0500 Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop Message-ID: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> Please review this change to G1DirtyCardQueueSet::Queue::pop. Previously, if there was exactly one element in the queue, a pop operation could not return it, because doing so could break invariants for concurrent operations. Now, if there is one element and there are concurrent pop operations, one of those operations will win. Note that there are still races between pop and push/append that may prevent the pop operation from obtaining an element. CR: https://bugs.openjdk.java.net/browse/JDK-8238867 Webrev: https://cr.openjdk.java.net/~kbarrett/8238867/open.00/ Testing: mach5 tier1-3. mach5 tier1-5 (only linux-x64) in conjunction with other changes. Some performance testing didn't find any unexpected differences. From leo.korinth at oracle.com Wed Feb 12 08:12:48 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Wed, 12 Feb 2020 09:12:48 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> Message-ID: <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com> Hi Ivan, On 11/02/2020 11:47, Ivan Walulya wrote: > Thanks Stefan, find below patch with the suggested updates. > > http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ > > http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ Looks good, I will help you push it. Thanks, Leo > > //Ivan > >> On 11 Feb 2020, at 11:26, Stefan Johansson wrote: >> >> H Ivan, >> >>> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya : >>> >>> Hi all, >>> >>> Please review a small modification to turn parallel gc develop tracing flags into unified logging >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 >>> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ >>> >> When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here: >> 1616 #ifdef ASSERT >> 1617 log_develop_debug(gc, marking)( >> 1618 "add_obj_count=" SIZE_FORMAT " " >> 1619 "add_obj_bytes=" SIZE_FORMAT, >> 1620 add_obj_count, >> 1621 add_obj_size * HeapWordSize); >> 1622 log_develop_debug(gc, marking)( >> 1623 "mark_bitmap_count=" SIZE_FORMAT " " >> 1624 "mark_bitmap_bytes=" SIZE_FORMAT, >> 1625 mark_bitmap_count, >> 1626 mark_bitmap_size * HeapWordSize); >> 1627 #endif // #ifdef ASSERT >> >> Otherwise a very nice cleanup. >> >> Thanks, >> Stefan >> >>> Testing: Tier 1 - Tier 3 >>> >>> //Ivan >> > From shade at redhat.com Wed Feb 12 09:10:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 12 Feb 2020 10:10:23 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type In-Reply-To: References: Message-ID: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com> On 2/11/20 8:09 PM, Roman Kennke wrote: >>> In ShBSC1::ensure_in_register() we are blindly creating registers of >>> type T_OBJECT, even though in some cases we actually need T_ADDRESS. >>> This blows up when we verify oop registers: when the argument is of type >>> T_OBJECT we perform extra checks that fail when the value in register is >>> not actually an object. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8238851 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ >>> Testing: the provided testcase passes now. hotspot_gc_shenandoah >> >> This path probably needs adjustment too: >> >> 167 #ifdef AARCH64 >> 168 // AArch64 expects double-size register. >> 169 obj_reg = gen->new_pointer_register(); >> 170 #else > > The provided test passes on aarch64 without any additional changes. > > I tried removing the block, hoping that the suggested change does > perhaps make it unnecessary, but no. It's still needed. Gaawh. The non-AARCH64 path still looks good, so we can push it in current form. We really need to figure out AARCH64 thingie, please file the follow-up RFR? -- Thanks, -Aleksey From ivan.walulya at oracle.com Wed Feb 12 09:31:16 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Wed, 12 Feb 2020 10:31:16 +0100 Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified logging In-Reply-To: <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com> References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com> <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com> <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com> <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com> Message-ID: <88DBCBBC-C355-4BD5-8DAA-1ECDD487D17A@oracle.com> Thanks Leo! //Ivan > On 12 Feb 2020, at 09:12, Leo Korinth wrote: > > Hi Ivan, > > On 11/02/2020 11:47, Ivan Walulya wrote: >> Thanks Stefan, find below patch with the suggested updates. >> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ >> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ > > > Looks good, I will help you push it. > > Thanks, > Leo > > >> //Ivan >>> On 11 Feb 2020, at 11:26, Stefan Johansson wrote: >>> >>> H Ivan, >>> >>>> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya : >>>> >>>> Hi all, >>>> >>>> Please review a small modification to turn parallel gc develop tracing flags into unified logging >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 >>>> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ >>>> >>> When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here: >>> 1616 #ifdef ASSERT >>> 1617 log_develop_debug(gc, marking)( >>> 1618 "add_obj_count=" SIZE_FORMAT " " >>> 1619 "add_obj_bytes=" SIZE_FORMAT, >>> 1620 add_obj_count, >>> 1621 add_obj_size * HeapWordSize); >>> 1622 log_develop_debug(gc, marking)( >>> 1623 "mark_bitmap_count=" SIZE_FORMAT " " >>> 1624 "mark_bitmap_bytes=" SIZE_FORMAT, >>> 1625 mark_bitmap_count, >>> 1626 mark_bitmap_size * HeapWordSize); >>> 1627 #endif // #ifdef ASSERT >>> >>> Otherwise a very nice cleanup. >>> >>> Thanks, >>> Stefan >>> >>>> Testing: Tier 1 - Tier 3 >>>> >>>> //Ivan >>> From maoliang.ml at alibaba-inc.com Wed Feb 12 10:17:15 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 12 Feb 2020 18:17:15 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> Hi Thomas, I made a new patch for the issues we listed in JDK-8238686 and JDK-8236073: http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ Main changes are: 1) Don't use MinHeapFreeRatio in concurrent mark stage to guarantee the minimal commit size as we discussed. I use the IHOP prediction instead. 2) Remove resize_heap_if_necessary in remark. Heap expansion will be based on 1) in concurrent mark cleanup pause but there is no shrink at that time 3) Heap shrink will happen after mixed GC(s). I use 3 number to determine the target capacity: a) maximum_desired_capacity by MaxHeapFreeRatio(here I still use MaxHeapFreeRatio because it is unified with full gc, 30% of live objects make sence); b) minimum_desired_bytes predicted in 1) in cleanup pause(to make sure we will not do a shrink just after an expansion); c) soft_max_capacity 4) expand/shrink logic are moved into sizing policy. Apparently, it solves the issues in both JDK-8238686 and JDK-8236073. I have performanced the original specjbb2015 test and it looks fine. The test will not commit memory so aggressive as original in remark and it is able to shrink heap after changing SoftMaxHeapSize(2500M) via jinfo. The heap capacity will drop from ~3G to 2500m. The new flow can work with JEP 346 and benifit it for better memory saving. The only remaining problem is shrinking heap after mixed GCs may not happen on time if application is in "idle". We may still need a timer to make sure mixed GC can happen? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 11 (Tue.) 21:27 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 11.02.20 13:52, Liang Mao wrote: > Hi Thomas, > >> I think the calculation assumes that the next gc is the first mixed gc, >> which isn't true, there's that "Prepare Mixed" GC too. But as an initial >> approximation it should work. > > I assumed the prepare mixed GC. But technically we need the promotion > bytes in 1st mixed GC too, right? After I took a look at gc > log, i found there could be several normal young GC between remark > and "Prepare Mixed" GC because it cost time to do some cleanup. Actually, building the remembered sets. > So do you think resize in "Pause Cleanup" is a better way? I am certainly not opposed to moving resizing to the cleanup pause or anywhere else (last mixed gc?) where it makes most sense. Moving to Cleanup would likely make prediction about the "needed" memory easier. >> You mean because of regions being larger than the commit size or other >> reasons? I.e. you have 2M large pages but only 1M regions, so you may >> end up with G1 not being able to actually uncommit as only half of that >> 2M page is free? > > No. Sorry for my unclear discription. My point is update_heap_target_size > can happen in every normal GC but in remark real shrink may never happen > (the large MaxHeapFreeRatio will prevent the shrinking). > So we may use smaller heap size but no regions are uncommitted. that is true, that is the same issue I wanted to point out with my earlier remark about "The resizing logic change could be handled under JDK-8238686, although this change does not modify the use of MaxHeapFreeRatio. " - i.e. does not uncommit space due to MaxHeapFreeRatio. That may be handled separately if it is easier; JDK-8238686 suggests to think about removing the use of Min/MaxHeapFreeRatio alltogether (or maybe use only during full gc). > > Thanks, > Liang > Hth, Thomas From richard.reingruber at sap.com Wed Feb 12 10:23:27 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 12 Feb 2020 10:23:27 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Message-ID: // Repost including hotspot runtime and gc lists. // Dean Long suggested to do so, because the enhancement replaces a vm operation // with a handshake. // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html Hi, could I please get reviews for this small enhancement in hotspot's jvmti implementation: Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 The change avoids making all compiled methods on stack not_entrant when switching a java thread to interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack. Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations. Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms. Thanks, Richard. See also my question if anyone knows a reason for making the compiled methods not_entrant: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html From thomas.schatzl at oracle.com Wed Feb 12 11:16:50 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 12 Feb 2020 12:16:50 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> Message-ID: <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> Hi Liang, On 12.02.20 11:17, Liang Mao wrote: > Hi Thomas, > > I made a new patch for the issues we listed in?JDK-8238686 and > JDK-8236073: > http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ thanks. I only had time to quickly browse the change, and started building and testing it internally. I will run it through our perf benchmarks to look for regressions of out-of-box behavior. I will need a day or two until I can get back to looking at the change in detail. There is currently something else I need to look at. Sorry. Thanks, Thomas From stefan.johansson at oracle.com Wed Feb 12 11:38:47 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 12 Feb 2020 12:38:47 +0100 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks In-Reply-To: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> Message-ID: <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> Hi Thomas, > 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl : > > Hi all, > > can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238854 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238854/webrev/ Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure: void* MemRegion::operator new(size_t size) throw() { return (address)AllocateHeap(size, mtGC, CURRENT_PC, AllocFailStrategy::RETURN_NULL); } So we should either change this to use the default AllocFailStrategy or keep the checks. Otherwise it looks good, Stefan > Testing: > hs-tier1-5 without differences > > Thanks, > Thomas From thomas.schatzl at oracle.com Wed Feb 12 12:10:46 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 12 Feb 2020 13:10:46 +0100 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks In-Reply-To: <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> Message-ID: Hi Stefan, thanks for your review. On 12.02.20 12:38, Stefan Johansson wrote: > Hi Thomas, > >> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl : >> >> Hi all, >> >> can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238854 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/ > Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure: > void* MemRegion::operator new(size_t size) throw() { > return (address)AllocateHeap(size, mtGC, CURRENT_PC, > AllocFailStrategy::RETURN_NULL); > } > > So we should either change this to use the default AllocFailStrategy or keep the checks. > Nice catch. I opted to revert the changes for MemRegion allocation. Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened). (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more). http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff) http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full) Thanks, Thomas From stefan.johansson at oracle.com Wed Feb 12 12:16:15 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 12 Feb 2020 13:16:15 +0100 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks In-Reply-To: References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> Message-ID: Hi Thomas, > 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl : > > Hi Stefan, > > thanks for your review. > > On 12.02.20 12:38, Stefan Johansson wrote: >> Hi Thomas, >>> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl : >>> >>> Hi all, >>> >>> can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8238854 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/ >> Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure: >> void* MemRegion::operator new(size_t size) throw() { >> return (address)AllocateHeap(size, mtGC, CURRENT_PC, >> AllocFailStrategy::RETURN_NULL); >> } >> So we should either change this to use the default AllocFailStrategy or keep the checks. > > Nice catch. I opted to revert the changes for MemRegion allocation. > > Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened). > > (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more). > > http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff) > http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full) I agree with your reasoning above and think this is good. Thanks, Stefan > > Thanks, > Thomas From rkennke at redhat.com Wed Feb 12 14:21:12 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 12 Feb 2020 15:21:12 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type In-Reply-To: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com> References: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com> Message-ID: >>>> In ShBSC1::ensure_in_register() we are blindly creating registers of >>>> type T_OBJECT, even though in some cases we actually need T_ADDRESS. >>>> This blows up when we verify oop registers: when the argument is of type >>>> T_OBJECT we perform extra checks that fail when the value in register is >>>> not actually an object. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8238851 >>>> Webrev: >>>> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ >>>> Testing: the provided testcase passes now. hotspot_gc_shenandoah >>> >>> This path probably needs adjustment too: >>> >>> 167 #ifdef AARCH64 >>> 168 // AArch64 expects double-size register. >>> 169 obj_reg = gen->new_pointer_register(); >>> 170 #else >> >> The provided test passes on aarch64 without any additional changes. >> >> I tried removing the block, hoping that the suggested change does >> perhaps make it unnecessary, but no. It's still needed. > > Gaawh. The non-AARCH64 path still looks good, so we can push it in current form. We really need to > figure out AARCH64 thingie, please file the follow-up RFR? Turns out that we can fix this rather easily. It is the non-aarch64 path that is wrong though: Differential: http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01.diff/ Full: http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ This is also consistent with the implementations of LIRAssembler::leal() both x86 and aarch64. Testing: passes hotspot_gc_shenandoah both aarch64 and x86 Good? Roman From shade at redhat.com Wed Feb 12 14:25:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 12 Feb 2020 15:25:23 +0100 Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct type In-Reply-To: References: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com> Message-ID: On 2/12/20 3:21 PM, Roman Kennke wrote: > Differential: > http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/ > > This is also consistent with the implementations of LIRAssembler::leal() > both x86 and aarch64. Looks good. -- Thanks, -Aleksey From mikael.vidstedt at oracle.com Wed Feb 12 17:27:33 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 12 Feb 2020 09:27:33 -0800 Subject: RFR(XS): 8238932: Invalid tier1_gc_1 test group definition Message-ID: Please review this small change which fixes the definition of the tier1_gc_1 jtreg test group. JBS: https://bugs.openjdk.java.net/browse/JDK-8238932 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8238932/webrev.00/open/webrev/ The issue was introduced as part of JDK-8212657[1] "Promptly Return Unused Committed Memory from G1?. The missing backslash means "-gc/g1/TestTimelyCompaction.java? will actually be interpreted as a test group name by jtreg, resulting in an empty test group. There is no TestTimelyCompaction.java test/file, so either it was not added, or it?s really supposed to be test/hotspot/jtreg/gc/g1/TestPeriodicCollection.java (which was added as part of the same change). In either case, since the test group definition has been used successfully for more than a year now it seems like simply removing the faulty line should do the trick..? Cheers, Mikael [1] https://bugs.openjdk.java.net/browse/JDK-8212657 From kim.barrett at oracle.com Wed Feb 12 18:47:36 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 12 Feb 2020 13:47:36 -0500 Subject: RFR(XS): 8238932: Invalid tier1_gc_1 test group definition In-Reply-To: References: Message-ID: <71EAB7E4-2B66-4BCD-90D5-CF6AC2C814A5@oracle.com> > On Feb 12, 2020, at 12:27 PM, Mikael Vidstedt wrote: > > > Please review this small change which fixes the definition of the tier1_gc_1 jtreg test group. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8238932 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8238932/webrev.00/open/webrev/ > > The issue was introduced as part of JDK-8212657[1] "Promptly Return Unused Committed Memory from G1?. The missing backslash means "-gc/g1/TestTimelyCompaction.java? will actually be interpreted as a test group name by jtreg, resulting in an empty test group. There is no TestTimelyCompaction.java test/file, so either it was not added, or it?s really supposed to be test/hotspot/jtreg/gc/g1/TestPeriodicCollection.java (which was added as part of the same change). > > In either case, since the test group definition has been used successfully for more than a year now it seems like simply removing the faulty line should do the trick..? > > Cheers, > Mikael > > [1] https://bugs.openjdk.java.net/browse/JDK-8212657 Looks good, and trivial. Maybe someone should contact the original authors of the change regarding the possibly missing test. From kim.barrett at oracle.com Thu Feb 13 00:51:18 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 12 Feb 2020 19:51:18 -0500 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks In-Reply-To: References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> Message-ID: > On Feb 12, 2020, at 7:16 AM, Stefan Johansson wrote: > > Hi Thomas, > >> 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl : >> >> Hi Stefan, >> >> thanks for your review. >> >> On 12.02.20 12:38, Stefan Johansson wrote: >>> Hi Thomas, >>>> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl : >>>> >>>> Hi all, >>>> >>>> can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM. >>>> >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8238854 >>>> Webrev: >>>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/ >>> Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure: >>> void* MemRegion::operator new(size_t size) throw() { >>> return (address)AllocateHeap(size, mtGC, CURRENT_PC, >>> AllocFailStrategy::RETURN_NULL); >>> } >>> So we should either change this to use the default AllocFailStrategy or keep the checks. >> >> Nice catch. I opted to revert the changes for MemRegion allocation. >> >> Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened). >> >> (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more). >> >> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full) > > I agree with your reasoning above and think this is good. > > Thanks, > Stefan I agree too. Looks good. Maybe file an RFE to look at this? MemRegion allocator functions are declared throw(), which is atypical and definitely strange for us. When building with gcc we use -fcheck-new. I?m not sure how those interact, or exactly what -fcheck-new does, or whether we actually need -fcheck-new. From manc at google.com Thu Feb 13 01:34:43 2020 From: manc at google.com (Man Cao) Date: Wed, 12 Feb 2020 17:34:43 -0800 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com> References: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com> Message-ID: Could I have a second review? -Man From ivan.walulya at oracle.com Thu Feb 13 09:40:34 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 13 Feb 2020 10:40:34 +0100 Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop In-Reply-To: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> Message-ID: <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com> This is a good fix to blocking on the last element. (Not a reviewer). > On 12 Feb 2020, at 01:45, Kim Barrett wrote: > > Please review this change to G1DirtyCardQueueSet::Queue::pop. > Previously, if there was exactly one element in the queue, a pop > operation could not return it, because doing so could break invariants > for concurrent operations. Now, if there is one element and there are > concurrent pop operations, one of those operations will win. Note > that there are still races between pop and push/append that may > prevent the pop operation from obtaining an element. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238867 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8238867/open.00/ > > Testing: > mach5 tier1-3. > mach5 tier1-5 (only linux-x64) in conjunction with other changes. > Some performance testing didn't find any unexpected differences. > From thomas.schatzl at oracle.com Thu Feb 13 09:59:24 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 13 Feb 2020 10:59:24 +0100 Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks In-Reply-To: References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com> <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com> Message-ID: <4864d2fb-933b-7be3-14c5-7903a11c7ff0@oracle.com> Hi Kim, Stefan, On 13.02.20 01:51, Kim Barrett wrote: >> On Feb 12, 2020, at 7:16 AM, Stefan Johansson wrote: >> >> Hi Thomas, >> >>> 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl : [...] >>> Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened). >>> >>> (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more). >>> >>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff) >>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full) >> >> I agree with your reasoning above and think this is good. >> >> Thanks, >> Stefan > > I agree too. Looks good. > > Maybe file an RFE to look at this? MemRegion allocator functions are declared throw(), which is > atypical and definitely strange for us. When building with gcc we use -fcheck-new. I?m not sure > how those interact, or exactly what -fcheck-new does, or whether we actually need -fcheck-new. > Filed JDK-8238999; thanks for your reviews. Thomas From stefan.johansson at oracle.com Thu Feb 13 11:23:52 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 13 Feb 2020 12:23:52 +0100 Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop In-Reply-To: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> Message-ID: <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com> Hi Kim, > 12 feb. 2020 kl. 01:45 skrev Kim Barrett : > > Please review this change to G1DirtyCardQueueSet::Queue::pop. > Previously, if there was exactly one element in the queue, a pop > operation could not return it, because doing so could break invariants > for concurrent operations. Now, if there is one element and there are > concurrent pop operations, one of those operations will win. Note > that there are still races between pop and push/append that may > prevent the pop operation from obtaining an element. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238867 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8238867/open.00/ Looks good, thanks for all the comments. Makes it easier to follow. Thanks, Stefan > > Testing: > mach5 tier1-3. > mach5 tier1-5 (only linux-x64) in conjunction with other changes. > Some performance testing didn't find any unexpected differences. > From stefan.johansson at oracle.com Thu Feb 13 12:01:12 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 13 Feb 2020 13:01:12 +0100 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: References: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com> Message-ID: <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com> Looks good, Stefan > 13 feb. 2020 kl. 02:34 skrev Man Cao : > > Could I have a second review? > > -Man From manc at google.com Thu Feb 13 18:57:58 2020 From: manc at google.com (Man Cao) Date: Thu, 13 Feb 2020 10:57:58 -0800 Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in gc/g1/unloading/libdefine.cpp In-Reply-To: <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com> References: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com> <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com> Message-ID: Thanks for the reviews! -Man From kim.barrett at oracle.com Thu Feb 13 19:53:12 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 13 Feb 2020 14:53:12 -0500 Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop In-Reply-To: <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com> References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com> Message-ID: <5BFAB89A-6476-42D4-AE04-92B053615E4C@oracle.com> > On Feb 13, 2020, at 4:40 AM, Ivan Walulya wrote: > > This is a good fix to blocking on the last element. (Not a reviewer). Thanks. > >> On 12 Feb 2020, at 01:45, Kim Barrett wrote: >> >> Please review this change to G1DirtyCardQueueSet::Queue::pop. >> Previously, if there was exactly one element in the queue, a pop >> operation could not return it, because doing so could break invariants >> for concurrent operations. Now, if there is one element and there are >> concurrent pop operations, one of those operations will win. Note >> that there are still races between pop and push/append that may >> prevent the pop operation from obtaining an element. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238867 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/ >> >> Testing: >> mach5 tier1-3. >> mach5 tier1-5 (only linux-x64) in conjunction with other changes. >> Some performance testing didn't find any unexpected differences. From kim.barrett at oracle.com Thu Feb 13 19:53:26 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 13 Feb 2020 14:53:26 -0500 Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop In-Reply-To: <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com> References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com> <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com> Message-ID: > On Feb 13, 2020, at 6:23 AM, Stefan Johansson wrote: > > Hi Kim, > >> 12 feb. 2020 kl. 01:45 skrev Kim Barrett : >> >> Please review this change to G1DirtyCardQueueSet::Queue::pop. >> Previously, if there was exactly one element in the queue, a pop >> operation could not return it, because doing so could break invariants >> for concurrent operations. Now, if there is one element and there are >> concurrent pop operations, one of those operations will win. Note >> that there are still races between pop and push/append that may >> prevent the pop operation from obtaining an element. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238867 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/ > Looks good, thanks for all the comments. Makes it easier to follow. Thanks. > > Thanks, > Stefan > >> >> Testing: >> mach5 tier1-3. >> mach5 tier1-5 (only linux-x64) in conjunction with other changes. >> Some performance testing didn't find any unexpected differences. From kim.barrett at oracle.com Fri Feb 14 01:46:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 13 Feb 2020 20:46:46 -0500 Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously paused buffers Message-ID: Please review this simplification of the handling of previously paused buffers by G1DirtyCardQueueSet. This change moves the call to enqueue_previous_paused_buffers() into record_paused_buffer(). This ensures any paused buffers from a previous safepoint have been flushed out before recording a buffer for the next safepoint. This move eliminates the former precondition that the enqueue had to have been performed before recording. This move also permits the enqueue_previous_paused_buffers in get_completed_buffer() to be moved to a point where it will be called much more rarely, slightly improving the normal performance of get_dirtied_buffer. The old location of the call was in support of the call order invariant needed by record_paused_buffer(). As a consequence of the changed enqueue locations, the fast path check in enqueue_previous_paused_buffers() will now only rarely succeed, and is no longer worth the (very small) performance cost and (much more importantly) the largish block comment arguing its correctness. So that fast path is removed. And since the raison d'etre for PausedBuffers::is_empty() was to support that fast path, that function is also removed. CR: https://bugs.openjdk.java.net/browse/JDK-8238979 Webrev: https://cr.openjdk.java.net/~kbarrett/8238979/open.00/ Testing: mach5 tier1-5 in conjunction with other in-development changes. Local (linux-x64) hotspot:tier1 for this change in isolation. From suenaga at oss.nttdata.com Fri Feb 14 09:07:59 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 14 Feb 2020 18:07:59 +0900 Subject: Use DAX in ZGC Message-ID: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> Hi all, I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. It seems to allow when filesystem is hugetlbfs or tmpfs. According to kernel document [1], DAX is supported in ext2, ext4, and xfs. Also we need to mount it with "-o dax". I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. What do you think this change? http://cr.openjdk.java.net/~ysuenaga/dax-z/ If it can be accepted, I will file it to JBS and will propose CSR. Thanks, Yasumasa [1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt From per.liden at oracle.com Fri Feb 14 11:52:42 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 14 Feb 2020 12:52:42 +0100 Subject: Use DAX in ZGC In-Reply-To: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> Message-ID: <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> Hi Yasumasa, On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: > Hi all, > > I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it > couldn't. > It seems to allow when filesystem is hugetlbfs or tmpfs. > > According to kernel document [1], DAX is supported in ext2, ext4, and xfs. > Also we need to mount it with "-o dax". > > I want to use ZGC on DAX, so I want to introduce new option > -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing > storage. > What do you think this change? + experimental(bool, ZAllowHeapOnFileSystem, false, \ + "Allow to use filesystem as Java heap backing storage " \ + "specified by -XX:AllocateHeapAt") \ + \ Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : os::large_page_size(); - if (expected_block_size != _block_size) { + if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) { log_error(gc)("%s filesystem has unexpected block size " SIZE_FORMAT " (expected " SIZE_FORMAT ")", is_tmpfs() ? ZFILESYSTEM_TMPFS : ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size); return; } This part looks potentially dangerous, since we might then be working with an incorrect _block_size. int ZPhysicalMemoryBacking::create_file_fd(const char* name) const { + if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) { + log_error(gc)("-XX:AllocateHeapAt is needed when ZAllowHeapOnFileSystem is specified"); + return -1; + } + const char* const filesystem = ZLargePages::is_explicit() ? ZFILESYSTEM_HUGETLBFS : ZFILESYSTEM_TMPFS; This part looks unnecessary, no? cheers, Per > > ? http://cr.openjdk.java.net/~ysuenaga/dax-z/ > > If it can be accepted, I will file it to JBS and will propose CSR. > > > Thanks, > > Yasumasa > > > [1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt From richard.reingruber at sap.com Fri Feb 14 12:58:41 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 14 Feb 2020 12:58:41 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com> References: <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com> Message-ID: Hi Patricio, thanks for having a look. > I?m only commenting on the handshake changes. > I see that operation VM_EnterInterpOnlyMode can be called inside > operation VM_SetFramePop which also allows nested operations. Here is a > comment in VM_SetFramePop definition: > > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. > > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we > could have a handshake inside a safepoint operation. The issue I see > there is that at the end of the handshake the polling page of the target > thread could be disarmed. So if the target thread happens to be in a > blocked state just transiently and wakes up then it will not stop for > the ongoing safepoint. Maybe I can file an RFE to assert that the > polling page is armed at the beginning of disarm_safepoint(). I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a handshake cannot be nested in a vm operation. Maybe it should be asserted in the Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation? > Alternatively I think you could do something similar to what we do in > Deoptimization::deoptimize_all_marked(): > > EnterInterpOnlyModeClosure hs; > if (SafepointSynchronize::is_at_safepoint()) { > hs.do_thread(state->get_thread()); > } else { > Handshake::execute(&hs, state->get_thread()); > } > (you could pass ?EnterInterpOnlyModeClosure? directly to the > HandshakeClosure() constructor) Maybe this could be used also in the Handshake::execute() methods as general solution? > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > always called in a nested operation or just sometimes. At least one execution path without vm operation exists: JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong JvmtiEventControllerPrivate::recompute_enabled() : void JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches) JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further encouraged to do it with a handshake :) Thanks again, Richard. -----Original Message----- From: Patricio Chilano Sent: Donnerstag, 13. Februar 2020 18:47 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, I?m only commenting on the handshake changes. I see that operation VM_EnterInterpOnlyMode can be called inside operation VM_SetFramePop which also allows nested operations. Here is a comment in VM_SetFramePop definition: // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. So if we change VM_EnterInterpOnlyMode to be a handshake, then now we could have a handshake inside a safepoint operation. The issue I see there is that at the end of the handshake the polling page of the target thread could be disarmed. So if the target thread happens to be in a blocked state just transiently and wakes up then it will not stop for the ongoing safepoint. Maybe I can file an RFE to assert that the polling page is armed at the beginning of disarm_safepoint(). I think one option could be to remove SafepointMechanism::disarm_if_needed() in HandshakeState::clear_handshake() and let each JavaThread disarm itself for the handshake case. Alternatively I think you could do something similar to what we do in Deoptimization::deoptimize_all_marked(): ? EnterInterpOnlyModeClosure hs; ? if (SafepointSynchronize::is_at_safepoint()) { ??? hs.do_thread(state->get_thread()); ? } else { ??? Handshake::execute(&hs, state->get_thread()); ? } (you could pass ?EnterInterpOnlyModeClosure? directly to the HandshakeClosure() constructor) I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is always called in a nested operation or just sometimes. Thanks, Patricio On 2/12/20 7:23 AM, Reingruber, Richard wrote: > // Repost including hotspot runtime and gc lists. > // Dean Long suggested to do so, because the enhancement replaces a vm operation > // with a handshake. > // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html > > Hi, > > could I please get reviews for this small enhancement in hotspot's jvmti implementation: > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 > > The change avoids making all compiled methods on stack not_entrant when switching a java thread to > interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack. > > Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations. > > Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms. > > Thanks, Richard. > > See also my question if anyone knows a reason for making the compiled methods not_entrant: > http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html From suenaga at oss.nttdata.com Fri Feb 14 13:31:29 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 14 Feb 2020 22:31:29 +0900 Subject: Use DAX in ZGC In-Reply-To: <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> Message-ID: Hi Per, On 2020/02/14 20:52, Per Liden wrote: > Hi Yasumasa, > > On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >> Hi all, >> >> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. >> It seems to allow when filesystem is hugetlbfs or tmpfs. >> >> According to kernel document [1], DAX is supported in ext2, ext4, and xfs. >> Also we need to mount it with "-o dax". >> >> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. >> What do you think this change? > > > +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ > +????????? "Allow to use filesystem as Java heap backing storage " ??? \ > +????????? "specified by -XX:AllocateHeapAt") ??? \ > + ??? \ > > Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. I thought so, but I guess it is difficult. PMDK also does not check it automatically. https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c#L18 In addition, we don't seem to be able to get mount option ("-o dax") via syscall. I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it. Another solution, we can use /proc/mounts, but it might be complex. > ?? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : os::large_page_size(); > -? if (expected_block_size != _block_size) { > +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) { > ???? log_error(gc)("%s filesystem has unexpected block size " SIZE_FORMAT " (expected " SIZE_FORMAT ")", > ?????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size); > ???? return; > ?? } > > This part looks potentially dangerous, since we might then be working with an incorrect _block_size. I guess block size in almost filesystems is 4KB even if DAX. (XFS allows variable block sizes...) https://nvdimm.wiki.kernel.org/2mib_fs_dax So I think we can limit _block_size to OS page size (4KB). > ?int ZPhysicalMemoryBacking::create_file_fd(const char* name) const { > +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) { > +??? log_error(gc)("-XX:AllocateHeapAt is needed when ZAllowHeapOnFileSystem is specified"); > +??? return -1; > +? } > + > ?? const char* const filesystem = ZLargePages::is_explicit() > ????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS > ????????????????????????????????? : ZFILESYSTEM_TMPFS; > > This part looks unnecessary, no? I added ZAllowHeapOnFileSystem to use with AllocateHeapAt. So I want to warn if AllocateHeapAt == NULL. Thanks, Yasumasa > cheers, > Per > >> >> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/ >> >> If it can be accepted, I will file it to JBS and will propose CSR. >> >> >> Thanks, >> >> Yasumasa >> >> >> [1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt From per.liden at oracle.com Fri Feb 14 14:08:55 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 14 Feb 2020 15:08:55 +0100 Subject: Use DAX in ZGC In-Reply-To: References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> Message-ID: <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> Hi, On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: > Hi Per, > > On 2020/02/14 20:52, Per Liden wrote: >> Hi Yasumasa, >> >> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but >>> it couldn't. >>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>> >>> According to kernel document [1], DAX is supported in ext2, ext4, and >>> xfs. >>> Also we need to mount it with "-o dax". >>> >>> I want to use ZGC on DAX, so I want to introduce new option >>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing >>> storage. >>> What do you think this change? >> >> >> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >> +????????? "Allow to use filesystem as Java heap backing storage " ??? \ >> +????????? "specified by -XX:AllocateHeapAt") ??? \ >> + ??? \ >> >> Instead of adding a new option it would be preferable to automatically >> detect that it's a dax mounted filesystem. But I haven't has a chance >> to look into the best way of doing that. > > I thought so, but I guess it is difficult. > PMDK also does not check it automatically. > > > https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ > > In addition, we don't seem to be able to get mount option ("-o dax") via > syscall. > I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th > argument (const void *data). It would be handled in each filesystem, so > I could not get it. > > Another solution, we can use /proc/mounts, but it might be complex. I was maybe hoping you could get this information through some ioctl() command on the file descriptor? > > >> ??? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() >> : os::large_page_size(); >> -? if (expected_block_size != _block_size) { >> +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) { >> ????? log_error(gc)("%s filesystem has unexpected block size " >> SIZE_FORMAT " (expected " SIZE_FORMAT ")", >> ??????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : >> ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size); >> ????? return; >> ??? } >> >> This part looks potentially dangerous, since we might then be working >> with an incorrect _block_size. > > I guess block size in almost filesystems is 4KB even if DAX. > (XFS allows variable block sizes...) With your current patch, a user could use -XX:AllocateHeapAt to point to any kind of file system, which (at least in theory) could have any block size. For things to work down the road we must ensure than ZGranuleSize is a multiple of _block_size. > > > https://urldefense.com/v3/__https://nvdimm.wiki.kernel.org/2mib_fs_dax__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpxnIc0as$ > > So I think we can limit _block_size to OS page size (4KB). > > >> ??int ZPhysicalMemoryBacking::create_file_fd(const char* name) const { >> +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) { >> +??? log_error(gc)("-XX:AllocateHeapAt is needed when >> ZAllowHeapOnFileSystem is specified"); >> +??? return -1; >> +? } >> + >> ??? const char* const filesystem = ZLargePages::is_explicit() >> ?????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS >> ?????????????????????????????????? : ZFILESYSTEM_TMPFS; >> >> This part looks unnecessary, no? > > I added ZAllowHeapOnFileSystem to use with AllocateHeapAt. > So I want to warn if AllocateHeapAt == NULL. Yes, but that seems unnecessary, and I suggest it's removed. cheers, /Per > > > Thanks, > > Yasumasa > > >> cheers, >> Per >> >>> >>> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/ >>> >>> If it can be accepted, I will file it to JBS and will propose CSR. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> [1] >>> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/filesystems/dax.txt__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpe5WElhc$ > From suenaga at oss.nttdata.com Fri Feb 14 14:23:04 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 14 Feb 2020 23:23:04 +0900 Subject: Use DAX in ZGC In-Reply-To: <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> Message-ID: <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> On 2020/02/14 23:08, Per Liden wrote: > Hi, > > On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >> Hi Per, >> >> On 2020/02/14 20:52, Per Liden wrote: >>> Hi Yasumasa, >>> >>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. >>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>> >>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs. >>>> Also we need to mount it with "-o dax". >>>> >>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. >>>> What do you think this change? >>> >>> >>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \ >>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>> + ??? \ >>> >>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. >> >> I thought so, but I guess it is difficult. >> PMDK also does not check it automatically. >> >> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >> In addition, we don't seem to be able to get mount option ("-o dax") via syscall. >> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it. >> >> Another solution, we can use /proc/mounts, but it might be complex. > > I was maybe hoping you could get this information through some ioctl() command on the file descriptor? I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get. (I use ext4 with "-o dax") >>> ??? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : os::large_page_size(); >>> -? if (expected_block_size != _block_size) { >>> +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) { >>> ????? log_error(gc)("%s filesystem has unexpected block size " SIZE_FORMAT " (expected " SIZE_FORMAT ")", >>> ??????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size); >>> ????? return; >>> ??? } >>> >>> This part looks potentially dangerous, since we might then be working with an incorrect _block_size. >> >> I guess block size in almost filesystems is 4KB even if DAX. >> (XFS allows variable block sizes...) > > With your current patch, a user could use -XX:AllocateHeapAt to point to any kind of file system, which (at least in theory) could have any block size. For things to work down the road we must ensure than ZGranuleSize is a multiple of _block_size. Ok. >> https://urldefense.com/v3/__https://nvdimm.wiki.kernel.org/2mib_fs_dax__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpxnIc0as$ >> So I think we can limit _block_size to OS page size (4KB). >> >> >>> ??int ZPhysicalMemoryBacking::create_file_fd(const char* name) const { >>> +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) { >>> +??? log_error(gc)("-XX:AllocateHeapAt is needed when ZAllowHeapOnFileSystem is specified"); >>> +??? return -1; >>> +? } >>> + >>> ??? const char* const filesystem = ZLargePages::is_explicit() >>> ?????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS >>> ?????????????????????????????????? : ZFILESYSTEM_TMPFS; >>> >>> This part looks unnecessary, no? >> >> I added ZAllowHeapOnFileSystem to use with AllocateHeapAt. >> So I want to warn if AllocateHeapAt == NULL. > > Yes, but that seems unnecessary, and I suggest it's removed. Ok. BTW is it worth to file JBS? Cheers, Yasumasa > cheers, > /Per > >> >> >> Thanks, >> >> Yasumasa >> >> >>> cheers, >>> Per >>> >>>> >>>> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/ >>>> >>>> If it can be accepted, I will file it to JBS and will propose CSR. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/filesystems/dax.txt__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpe5WElhc$ >> From patricio.chilano.mateo at oracle.com Fri Feb 14 14:53:52 2020 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Fri, 14 Feb 2020 11:53:52 -0300 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com> Message-ID: <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com> Hi Richard, On 2/14/20 9:58 AM, Reingruber, Richard wrote: > Hi Patricio, > > thanks for having a look. > > > I?m only commenting on the handshake changes. > > I see that operation VM_EnterInterpOnlyMode can be called inside > > operation VM_SetFramePop which also allows nested operations. Here is a > > comment in VM_SetFramePop definition: > > > > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is > > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. > > > > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we > > could have a handshake inside a safepoint operation. The issue I see > > there is that at the end of the handshake the polling page of the target > > thread could be disarmed. So if the target thread happens to be in a > > blocked state just transiently and wakes up then it will not stop for > > the ongoing safepoint. Maybe I can file an RFE to assert that the > > polling page is armed at the beginning of disarm_safepoint(). > > I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a > handshake cannot be nested in a vm operation. Maybe it should be asserted in the > Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation? > > > Alternatively I think you could do something similar to what we do in > > Deoptimization::deoptimize_all_marked(): > > > > EnterInterpOnlyModeClosure hs; > > if (SafepointSynchronize::is_at_safepoint()) { > > hs.do_thread(state->get_thread()); > > } else { > > Handshake::execute(&hs, state->get_thread()); > > } > > (you could pass ?EnterInterpOnlyModeClosure? directly to the > > HandshakeClosure() constructor) > > Maybe this could be used also in the Handshake::execute() methods as general solution? Right, we could also do that. Avoiding to clear the polling page in HandshakeState::clear_handshake() should be enough to fix this issue and execute a handshake inside a safepoint, but adding that "if" statement in Hanshake::execute() sounds good to avoid all the extra code that we go through when executing a handshake. I filed 8239084 to make that change. > > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > > always called in a nested operation or just sometimes. > > At least one execution path without vm operation exists: > > JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void > JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong > JvmtiEventControllerPrivate::recompute_enabled() : void > JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches) > JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void > JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError > jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError > > I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a > handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further > encouraged to do it with a handshake :) Ah! I think you can still do it with a handshake with the Deoptimization::deoptimize_all_marked() like solution. I can change the if-else statement with just the Handshake::execute() call in 8239084. But up to you.? : ) Thanks, Patricio > Thanks again, > Richard. > > -----Original Message----- > From: Patricio Chilano > Sent: Donnerstag, 13. Februar 2020 18:47 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > I?m only commenting on the handshake changes. > I see that operation VM_EnterInterpOnlyMode can be called inside > operation VM_SetFramePop which also allows nested operations. Here is a > comment in VM_SetFramePop definition: > > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. > > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we > could have a handshake inside a safepoint operation. The issue I see > there is that at the end of the handshake the polling page of the target > thread could be disarmed. So if the target thread happens to be in a > blocked state just transiently and wakes up then it will not stop for > the ongoing safepoint. Maybe I can file an RFE to assert that the > polling page is armed at the beginning of disarm_safepoint(). > > I think one option could be to remove > SafepointMechanism::disarm_if_needed() in > HandshakeState::clear_handshake() and let each JavaThread disarm itself > for the handshake case. > > Alternatively I think you could do something similar to what we do in > Deoptimization::deoptimize_all_marked(): > > ? EnterInterpOnlyModeClosure hs; > ? if (SafepointSynchronize::is_at_safepoint()) { > ??? hs.do_thread(state->get_thread()); > ? } else { > ??? Handshake::execute(&hs, state->get_thread()); > ? } > (you could pass ?EnterInterpOnlyModeClosure? directly to the > HandshakeClosure() constructor) > > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > always called in a nested operation or just sometimes. > > Thanks, > Patricio > > On 2/12/20 7:23 AM, Reingruber, Richard wrote: >> // Repost including hotspot runtime and gc lists. >> // Dean Long suggested to do so, because the enhancement replaces a vm operation >> // with a handshake. >> // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html >> >> Hi, >> >> could I please get reviews for this small enhancement in hotspot's jvmti implementation: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >> >> The change avoids making all compiled methods on stack not_entrant when switching a java thread to >> interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack. >> >> Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations. >> >> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms. >> >> Thanks, Richard. >> >> See also my question if anyone knows a reason for making the compiled methods not_entrant: >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html From thomas.schatzl at oracle.com Fri Feb 14 15:05:22 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2020 16:05:22 +0100 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads Message-ID: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> Hi all, can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion. They return NULL if there is not enough memory. This is uncommon to do in Hotspot code. All uses in the code either checks whether the allocation is non-NULL and then terminates the VM, or will just crash too. It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY allocations and do the initialization manually. cc'ing runtime because Coleen added the new operator for working around a Metaspace issue in JDK-8021954 years ago. CR: https://bugs.openjdk.java.net/browse/JDK-8238999 Webrev: http://cr.openjdk.java.net/~tschatzl/8238999/webrev/ Testing: hs-tier1-4 Thanks, Thomas From thomas.schatzl at oracle.com Fri Feb 14 15:09:20 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2020 16:09:20 +0100 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions Message-ID: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> Hi all, can I have reviews for this change that plugs a (tiny) memory leak when we unsuccessfully map CDS archives into the Java heap? The FileMapInfo::map_heap_data() method allocates some array of MemRegions, and in case we fail to map the archive, we return that method without assigning it to something or deallocating that memory. Found while working on JDK-8238999, also out for review, and depending on it. CR: https://bugs.openjdk.java.net/browse/JDK-8239070 Webrev: http://cr.openjdk.java.net/~tschatzl/8239070/webrev/ Testing: hs-tier1-4 Thanks, Thomas From ioi.lam at oracle.com Fri Feb 14 16:08:45 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 14 Feb 2020 08:08:45 -0800 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> Message-ID: <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> Hi Thomas, Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place. Thanks - Ioi On 2/14/20 7:09 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this change that plugs a (tiny) memory leak > when we unsuccessfully map CDS archives into the Java heap? > > The FileMapInfo::map_heap_data() method allocates some array of > MemRegions, and in case we fail to map the archive, we return that > method without assigning it to something or deallocating that memory. > > Found while working on JDK-8238999, also out for review, and depending > on it. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239070 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8239070/webrev/ > Testing: > hs-tier1-4 > > Thanks, > ? Thomas From ioi.lam at oracle.com Fri Feb 14 16:12:37 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 14 Feb 2020 08:12:37 -0800 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> Message-ID: <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com> Hi Thomas, Maybe we can fold this into a MemRegion::create(int size) function? 1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, mtInternal); 1751?? for (int i = 0; i < max; i++) { 1752???? ::new (®ions[i]) MemRegion(); 1753?? } Thanks - Ioi On 2/14/20 7:05 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change to the MemRegion class to > remove unnecessary new/delete overloads from MemRegion. > > They return NULL if there is not enough memory. This is uncommon to do > in Hotspot code. > > All uses in the code either checks whether the allocation is non-NULL > and then terminates the VM, or will just crash too. > > It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY > allocations and do the initialization manually. > > cc'ing runtime because Coleen added the new operator for working > around a Metaspace issue in JDK-8021954 years ago. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238999 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238999/webrev/ > Testing: > hs-tier1-4 > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Fri Feb 14 17:06:10 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2020 18:06:10 +0100 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com> References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com> Message-ID: <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com> Hi, On 14.02.20 17:12, Ioi Lam wrote: > Hi Thomas, > > Maybe we can fold this into a MemRegion::create(int size) function? > > 1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, mtInternal); > 1751?? for (int i = 0; i < max; i++) { > 1752???? ::new (®ions[i]) MemRegion(); > 1753?? } > http://cr.openjdk.java.net/~tschatzl/8238999/webrev.0_to_1 http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1 Thanks, Thomas :) From per.liden at oracle.com Fri Feb 14 17:08:59 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 14 Feb 2020 18:08:59 +0100 Subject: Use DAX in ZGC In-Reply-To: <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> Message-ID: Hi, On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: > On 2020/02/14 23:08, Per Liden wrote: >> Hi, >> >> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>> Hi Per, >>> >>> On 2020/02/14 20:52, Per Liden wrote: >>>> Hi Yasumasa, >>>> >>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, >>>>> but it couldn't. >>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>> >>>>> According to kernel document [1], DAX is supported in ext2, ext4, >>>>> and xfs. >>>>> Also we need to mount it with "-o dax". >>>>> >>>>> I want to use ZGC on DAX, so I want to introduce new option >>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as >>>>> backing storage. >>>>> What do you think this change? >>>> >>>> >>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>> +????????? "Allow to use filesystem as Java heap backing storage " >>>> ??? \ >>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>> + ??? \ >>>> >>>> Instead of adding a new option it would be preferable to >>>> automatically detect that it's a dax mounted filesystem. But I >>>> haven't has a chance to look into the best way of doing that. >>> >>> I thought so, but I guess it is difficult. >>> PMDK also does not check it automatically. >>> >>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>> >>> In addition, we don't seem to be able to get mount option ("-o dax") >>> via syscall. >>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th >>> argument (const void *data). It would be handled in each filesystem, >>> so I could not get it. >>> >>> Another solution, we can use /proc/mounts, but it might be complex. >> >> I was maybe hoping you could get this information through some ioctl() >> command on the file descriptor? > > I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in > fsx_xflags), but I couldn't get. > (I use ext4 with "-o dax") Ok. It would be good to get to the bottom of why it's not set. cheers, Per From jianglizhou at google.com Fri Feb 14 17:15:23 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Fri, 14 Feb 2020 09:15:23 -0800 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> Message-ID: Hi Thomas, Thanks for finding the memory leak. The leak fix probably should be applied to JDK 11 as well (as a modified backport). I'll try to request it. Ioi's suggestion of refactoring region mapping code into a FileMapInfo::map_heap_data_impl sounds okay to me. Best regards, Jiangli On Fri, Feb 14, 2020 at 7:09 AM Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this change that plugs a (tiny) memory leak > when we unsuccessfully map CDS archives into the Java heap? > > The FileMapInfo::map_heap_data() method allocates some array of > MemRegions, and in case we fail to map the archive, we return that > method without assigning it to something or deallocating that memory. > > Found while working on JDK-8238999, also out for review, and depending > on it. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239070 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8239070/webrev/ > Testing: > hs-tier1-4 > > Thanks, > Thomas From ioi.lam at oracle.com Fri Feb 14 18:46:31 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 14 Feb 2020 10:46:31 -0800 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com> References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com> <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com> Message-ID: <25db6783-0a9c-b544-34ee-59d40f3e7f6c@oracle.com> Looks good to me. Thanks - Ioi On 2/14/20 9:06 AM, Thomas Schatzl wrote: > Hi, > > On 14.02.20 17:12, Ioi Lam wrote: >> Hi Thomas, >> >> Maybe we can fold this into a MemRegion::create(int size) function? >> >> 1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, >> mtInternal); >> 1751?? for (int i = 0; i < max; i++) { >> 1752???? ::new (®ions[i]) MemRegion(); >> 1753?? } >> > > http://cr.openjdk.java.net/~tschatzl/8238999/webrev.0_to_1 > http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1 > > Thanks, > ? Thomas :) > From richard.reingruber at sap.com Fri Feb 14 18:47:20 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 14 Feb 2020 18:47:20 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com> References: <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com> <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com> Message-ID: Hi Patricio, > > I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a > > handshake cannot be nested in a vm operation. Maybe it should be asserted in the > > Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation? > > > > > Alternatively I think you could do something similar to what we do in > > > Deoptimization::deoptimize_all_marked(): > > > > > > EnterInterpOnlyModeClosure hs; > > > if (SafepointSynchronize::is_at_safepoint()) { > > > hs.do_thread(state->get_thread()); > > > } else { > > > Handshake::execute(&hs, state->get_thread()); > > > } > > > (you could pass ?EnterInterpOnlyModeClosure? directly to the > > > HandshakeClosure() constructor) > > > > Maybe this could be used also in the Handshake::execute() methods as general solution? > Right, we could also do that. Avoiding to clear the polling page in > HandshakeState::clear_handshake() should be enough to fix this issue and > execute a handshake inside a safepoint, but adding that "if" statement > in Hanshake::execute() sounds good to avoid all the extra code that we > go through when executing a handshake. I filed 8239084 to make that change. Thanks for taking care of this and creating the RFE. > > > > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > > > always called in a nested operation or just sometimes. > > > > At least one execution path without vm operation exists: > > > > JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void > > JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong > > JvmtiEventControllerPrivate::recompute_enabled() : void > > JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches) > > JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void > > JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError > > jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError > > > > I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a > > handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further > > encouraged to do it with a handshake :) > Ah! I think you can still do it with a handshake with the > Deoptimization::deoptimize_all_marked() like solution. I can change the > if-else statement with just the Handshake::execute() call in 8239084. > But up to you. : ) Well, I think that's enough encouragement :) I'll wait for 8239084 and try then again. (no urgency and all) Thanks, Richard. -----Original Message----- From: Patricio Chilano Sent: Freitag, 14. Februar 2020 15:54 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, On 2/14/20 9:58 AM, Reingruber, Richard wrote: > Hi Patricio, > > thanks for having a look. > > > I?m only commenting on the handshake changes. > > I see that operation VM_EnterInterpOnlyMode can be called inside > > operation VM_SetFramePop which also allows nested operations. Here is a > > comment in VM_SetFramePop definition: > > > > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is > > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. > > > > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we > > could have a handshake inside a safepoint operation. The issue I see > > there is that at the end of the handshake the polling page of the target > > thread could be disarmed. So if the target thread happens to be in a > > blocked state just transiently and wakes up then it will not stop for > > the ongoing safepoint. Maybe I can file an RFE to assert that the > > polling page is armed at the beginning of disarm_safepoint(). > > I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a > handshake cannot be nested in a vm operation. Maybe it should be asserted in the > Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation? > > > Alternatively I think you could do something similar to what we do in > > Deoptimization::deoptimize_all_marked(): > > > > EnterInterpOnlyModeClosure hs; > > if (SafepointSynchronize::is_at_safepoint()) { > > hs.do_thread(state->get_thread()); > > } else { > > Handshake::execute(&hs, state->get_thread()); > > } > > (you could pass ?EnterInterpOnlyModeClosure? directly to the > > HandshakeClosure() constructor) > > Maybe this could be used also in the Handshake::execute() methods as general solution? Right, we could also do that. Avoiding to clear the polling page in HandshakeState::clear_handshake() should be enough to fix this issue and execute a handshake inside a safepoint, but adding that "if" statement in Hanshake::execute() sounds good to avoid all the extra code that we go through when executing a handshake. I filed 8239084 to make that change. > > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > > always called in a nested operation or just sometimes. > > At least one execution path without vm operation exists: > > JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void > JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong > JvmtiEventControllerPrivate::recompute_enabled() : void > JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches) > JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void > JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError > jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError > > I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a > handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further > encouraged to do it with a handshake :) Ah! I think you can still do it with a handshake with the Deoptimization::deoptimize_all_marked() like solution. I can change the if-else statement with just the Handshake::execute() call in 8239084. But up to you.? : ) Thanks, Patricio > Thanks again, > Richard. > > -----Original Message----- > From: Patricio Chilano > Sent: Donnerstag, 13. Februar 2020 18:47 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > I?m only commenting on the handshake changes. > I see that operation VM_EnterInterpOnlyMode can be called inside > operation VM_SetFramePop which also allows nested operations. Here is a > comment in VM_SetFramePop definition: > > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled. > > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we > could have a handshake inside a safepoint operation. The issue I see > there is that at the end of the handshake the polling page of the target > thread could be disarmed. So if the target thread happens to be in a > blocked state just transiently and wakes up then it will not stop for > the ongoing safepoint. Maybe I can file an RFE to assert that the > polling page is armed at the beginning of disarm_safepoint(). > > I think one option could be to remove > SafepointMechanism::disarm_if_needed() in > HandshakeState::clear_handshake() and let each JavaThread disarm itself > for the handshake case. > > Alternatively I think you could do something similar to what we do in > Deoptimization::deoptimize_all_marked(): > > ? EnterInterpOnlyModeClosure hs; > ? if (SafepointSynchronize::is_at_safepoint()) { > ??? hs.do_thread(state->get_thread()); > ? } else { > ??? Handshake::execute(&hs, state->get_thread()); > ? } > (you could pass ?EnterInterpOnlyModeClosure? directly to the > HandshakeClosure() constructor) > > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is > always called in a nested operation or just sometimes. > > Thanks, > Patricio > > On 2/12/20 7:23 AM, Reingruber, Richard wrote: >> // Repost including hotspot runtime and gc lists. >> // Dean Long suggested to do so, because the enhancement replaces a vm operation >> // with a handshake. >> // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html >> >> Hi, >> >> could I please get reviews for this small enhancement in hotspot's jvmti implementation: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >> >> The change avoids making all compiled methods on stack not_entrant when switching a java thread to >> interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack. >> >> Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations. >> >> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms. >> >> Thanks, Richard. >> >> See also my question if anyone knows a reason for making the compiled methods not_entrant: >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html From rkennke at redhat.com Fri Feb 14 19:00:41 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 14 Feb 2020 20:00:41 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers Message-ID: This is a fall-out from the recent Lucene debugging session. Currently, when emitting IN_NATIVE LRB in C1, we generate a simple runtime call directly in LIR. It'd arguably be more straightforward and maintainable to simply re-use what we do for regular LRB, with the only exception to call into a different runtime endpoint from the stub. It might also be more efficient because it checks heap-stable before calling into runtime. If we ever have to backport C1 IN_NATIVE barriers (JDK-8226695) to 11u (although, we should not strictly need native barriers there right now), it also means we can skip backporting JDK-8226822, that would no longer be needed. Bug: https://bugs.openjdk.java.net/browse/JDK-8239081 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/ Testing: hotspot_gc_shenandoah (x86_64, x86_32 and aarch64) Can I please get a review? Thanks, Roman From rkennke at redhat.com Fri Feb 14 19:29:34 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 14 Feb 2020 20:29:34 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification Message-ID: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> This is another fallout from the Lucene debugging sessions :-) Our nmethod verification has a number of problems: - the assert(oops->length() == oop_count(), "Must match") is too too strict. Weirdly, while we are registering an nmethod in one thread (under CodeCache_lock), another thread can already patch the same nmethod (under Patching_lock). Which can throw off the countings. - We need to skip Universe::non_oop_word() because that what standard oop iterator would do too. It's fixed by: 1. counting actual oops, skipping Universe::non_oop_word() instead of comparing with oop_count() 2. relaxing the assert from == to >= I've also added a sanity check: + assert(nm == data->nm(), "must be same nmethod"); I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I found that very useful and wouldn't want to throw it away. All of this has proven to be useful (if only to exclude the possibility that we mess up something in handling Nmethods). Bug: https://bugs.openjdk.java.net/browse/JDK-8237780 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/ Testing: provided testcase passes now (failed before). hotspot_gc_shenandoah is fine Can I please get a review? Thanks, Roman From thomas.schatzl at oracle.com Fri Feb 14 18:49:05 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2020 19:49:05 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> Message-ID: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> Hi, On 12.02.20 12:16, Thomas Schatzl wrote: > Hi Liang, > > On 12.02.20 11:17, Liang Mao wrote: >> Hi Thomas, >> >> I made a new patch for the issues we listed in?JDK-8238686 and >> JDK-8236073: >> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ > > ? thanks. I only had time to quickly browse the change, and started > building and testing it internally. I will run it through our perf > benchmarks to look for regressions of out-of-box behavior. > > I will need a day or two until I can get back to looking at the change > in detail. There is currently something else I need to look at. Sorry. initial results from testing: - gc/g1/TestPeriodicCollection.java fails consistently because the heap does not shrink as expected (but probably this is a test bug as it may expect that uncommit occurs at remark). - memory usage tends to be significantly higher with the change without improving scores. E.g. I have been running specjvm2008 out-of-box with no settings on different machine(s) (32gb ram min), and the build with the changes almost consistently uses more heap (i.e. committed size) than without, in the range of 10% without any performance increase. Specjvm2008 benchmarks are pretty simple application in terms of behavior, i.e. does the same things all the time. This also means that very likely the current sizing is already way beyond the point of diminishing returns (actually, this is a known issue :)); I would prefer if we did not add to that. ;) Unfortunately I lost the graphs I had generated (manually), and I do not have more time available right now so can't show you right now. I started some dacapo 2009 runs (running them for 30 iterations each). Did not have time to look at the changes themselves any further or investigate the reasons for this memory usage increase than I already did earlier; will continue on Tuesday as I'm taking the day off Monday. Thanks, Thomas From shade at redhat.com Fri Feb 14 21:18:11 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 14 Feb 2020 22:18:11 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> Message-ID: <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> On 2/14/20 8:29 PM, Roman Kennke wrote: > I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I > found that very useful and wouldn't want to throw it away. I think the proper way to do this is: #if 0 // Helpful for debugging ...but then I wonder, why not turn it into the actual fastdebug diagnostics? Our verifier/asserts very helpfully include a lot of debugging info into hs_err when asserts fail. Surely if we are chasing a very rare bug, it would be more convenient for hs_err to include that right away, not require us recompile the VM. > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/ *) Looks like you can just initialize "int count = _oop_count" and skip increments in the first loop. *) Capitalization in "Must", to match the style of other asserts: 305 assert(nm == data->nm(), "must be same nmethod"); *) assert(false, ...) is probably just fatal(...) -- Thanks, -Aleksey From shade at redhat.com Fri Feb 14 21:23:30 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 14 Feb 2020 22:23:30 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers In-Reply-To: References: Message-ID: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> On 2/14/20 8:00 PM, Roman Kennke wrote: > https://bugs.openjdk.java.net/browse/JDK-8239081 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/ Only some stylistic nits: *) I believe the convention is to name these boolean arguments "is_native"? *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const? -- Thanks, -Aleksey From kim.barrett at oracle.com Fri Feb 14 23:05:03 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 14 Feb 2020 18:05:03 -0500 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> Message-ID: <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com> > On Feb 14, 2020, at 10:05 AM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion. > > They return NULL if there is not enough memory. This is uncommon to do in Hotspot code. > > All uses in the code either checks whether the allocation is non-NULL and then terminates the VM, or will just crash too. > > It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY allocations and do the initialization manually. > > cc'ing runtime because Coleen added the new operator for working around a Metaspace issue in JDK-8021954 years ago. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238999 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238999/webrev/ > Testing: > hs-tier1-4 > > Thanks, > Thomas ------------------------------------------------------------------------------ src/hotspot/share/memory/memRegion.hpp 96 // Creates and initializes an array of MemRegions of the given length. 97 static MemRegion* create(uint length, MEMFLAGS flags); A function named "create" suggests to me creating a single object, not an array. Perhaps "make_array" or "create_array" or "new_array"? ------------------------------------------------------------------------------ Other than that, looks good. I don't need a new webrev for using any of the suggested names. I noticed the memory leak in map_heap_data, but see that you filed a separate bug for that, and already have a reviewed fix for it. From jianglizhou at google.com Fri Feb 14 23:14:17 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Fri, 14 Feb 2020 15:14:17 -0800 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com> References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com> Message-ID: On Fri, Feb 14, 2020 at 3:05 PM Kim Barrett wrote: > > > On Feb 14, 2020, at 10:05 AM, Thomas Schatzl wrote: > > > > Hi all, > > > > can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion. > > > > They return NULL if there is not enough memory. This is uncommon to do in Hotspot code. > > > > All uses in the code either checks whether the allocation is non-NULL and then terminates the VM, or will just crash too. > > > > It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY allocations and do the initialization manually. > > > > cc'ing runtime because Coleen added the new operator for working around a Metaspace issue in JDK-8021954 years ago. > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8238999 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8238999/webrev/ > > Testing: > > hs-tier1-4 > > > > Thanks, > > Thomas > > ------------------------------------------------------------------------------ > src/hotspot/share/memory/memRegion.hpp > 96 // Creates and initializes an array of MemRegions of the given length. > 97 static MemRegion* create(uint length, MEMFLAGS flags); > > A function named "create" suggests to me creating a single object, not > an array. Perhaps "make_array" or "create_array" or "new_array"? +1. I had the same thoughts when looking at the webrev.1. Best regards, Jiangli > > ------------------------------------------------------------------------------ > > Other than that, looks good. I don't need a new webrev for using any > of the suggested names. > > I noticed the memory leak in map_heap_data, but see that you filed a > separate bug for that, and already have a reviewed fix for it. > From kim.barrett at oracle.com Sat Feb 15 08:20:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 15 Feb 2020 03:20:38 -0500 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> Message-ID: > On Feb 14, 2020, at 11:08 AM, Ioi Lam wrote: > > Hi Thomas, > > Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place. Rather than splitting up the function, one could add a local cleanup handler: ... create and initialize regions object ... struct Cleanup { MemRegion* _regions; bool _aborted; Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {} ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); } } cleanup(regions); ... cleanup._aborted = false; return true; } or use std::unique_ptr :( From rkennke at redhat.com Sat Feb 15 12:35:10 2020 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 15 Feb 2020 13:35:10 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers In-Reply-To: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> References: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> Message-ID: <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com> >> https://bugs.openjdk.java.net/browse/JDK-8239081 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/ > > Only some stylistic nits: > > *) I believe the convention is to name these boolean arguments "is_native"? > > *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const? Right, good points! Both fixed here: http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/ Good now? Thanks for reviewing! Roman From suenaga at oss.nttdata.com Mon Feb 17 04:05:45 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 17 Feb 2020 13:05:45 +0900 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> Message-ID: <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> Hi, I filed this enhancement to JBS: JBS: https://bugs.openjdk.java.net/browse/JDK-8239129 CSR: https://bugs.openjdk.java.net/browse/JDK-8239130 webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ Could you review this change and CSR? It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205). Thanks, Yasumasa On 2020/02/15 2:08, Per Liden wrote: > Hi, > > On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: >> On 2020/02/14 23:08, Per Liden wrote: >>> Hi, >>> >>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>>> Hi Per, >>>> >>>> On 2020/02/14 20:52, Per Liden wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>>> Hi all, >>>>>> >>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. >>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>>> >>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs. >>>>>> Also we need to mount it with "-o dax". >>>>>> >>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. >>>>>> What do you think this change? >>>>> >>>>> >>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \ >>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>>> + ??? \ >>>>> >>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. >>>> >>>> I thought so, but I guess it is difficult. >>>> PMDK also does not check it automatically. >>>> >>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall. >>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it. >>>> >>>> Another solution, we can use /proc/mounts, but it might be complex. >>> >>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor? >> >> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get. >> (I use ext4 with "-o dax") > > > Ok. It would be good to get to the bottom of why it's not set. > > cheers, > Per From maoliang.ml at alibaba-inc.com Mon Feb 17 06:03:05 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 17 Feb 2020 14:03:05 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>, <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> Message-ID: Hi Thomas, > - gc/g1/TestPeriodicCollection.java fails consistently because the heap > does not shrink as expected (but probably this is a test bug as it may > expect that uncommit occurs at remark). The reason should be that the patch makes shrinking after mixed GC but the mixed gc doesn't happen. It's the only issue I listed for the change. > - memory usage tends to be significantly higher with the change without > improving scores. > E.g. I have been running specjvm2008 out-of-box with no settings on > different machine(s) (32gb ram min), and the build with the changes > almost consistently uses more heap (i.e. committed size) than without, > in the range of 10% without any performance increase. > Specjvm2008 benchmarks are pretty simple application in terms of > behavior, i.e. does the same things all the time. This also means that > very likely the current sizing is already way beyond the point of > diminishing returns (actually, this is a known issue :)); I would prefer > if we did not add to that. ;) I have 2 questions here. 1) specjvm2008 cannot run with jdk9+: https://bugs.openjdk.java.net/browse/JDK-8202460 I face the same problem. Do you have any way to perform the test in JDK15? 2) I didn't understand : "This also means that > very likely the current sizing is already way beyond the point of > diminishing returns (actually, this is a known issue :));" Could you please explain more about this? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 15 (Sat.) 03:51 To:hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 12.02.20 12:16, Thomas Schatzl wrote: > Hi Liang, > > On 12.02.20 11:17, Liang Mao wrote: >> Hi Thomas, >> >> I made a new patch for the issues we listed in JDK-8238686 and >> JDK-8236073: >> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ > > thanks. I only had time to quickly browse the change, and started > building and testing it internally. I will run it through our perf > benchmarks to look for regressions of out-of-box behavior. > > I will need a day or two until I can get back to looking at the change > in detail. There is currently something else I need to look at. Sorry. initial results from testing: - gc/g1/TestPeriodicCollection.java fails consistently because the heap does not shrink as expected (but probably this is a test bug as it may expect that uncommit occurs at remark). - memory usage tends to be significantly higher with the change without improving scores. E.g. I have been running specjvm2008 out-of-box with no settings on different machine(s) (32gb ram min), and the build with the changes almost consistently uses more heap (i.e. committed size) than without, in the range of 10% without any performance increase. Specjvm2008 benchmarks are pretty simple application in terms of behavior, i.e. does the same things all the time. This also means that very likely the current sizing is already way beyond the point of diminishing returns (actually, this is a known issue :)); I would prefer if we did not add to that. ;) Unfortunately I lost the graphs I had generated (manually), and I do not have more time available right now so can't show you right now. I started some dacapo 2009 runs (running them for 30 iterations each). Did not have time to look at the changes themselves any further or investigate the reasons for this memory usage increase than I already did earlier; will continue on Tuesday as I'm taking the day off Monday. Thanks, Thomas From per.liden at oracle.com Mon Feb 17 06:50:32 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 17 Feb 2020 07:50:32 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> Message-ID: <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> Hi, On 2/17/20 5:05 AM, Yasumasa Suenaga wrote: > Hi, > > I filed this enhancement to JBS: > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129 > ? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130 We will not introduce a new option like this, so please withdraw the CSR (you also don't need a CSR for adding an experimental options). > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss. cheers, Per > > Could you review this change and CSR? > It passed tests on submit repo > (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205). > > > Thanks, > > Yasumasa > > > On 2020/02/15 2:08, Per Liden wrote: >> Hi, >> >> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: >>> On 2020/02/14 23:08, Per Liden wrote: >>>> Hi, >>>> >>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>>>> Hi Per, >>>>> >>>>> On 2020/02/14 20:52, Per Liden wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, >>>>>>> but it couldn't. >>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>>>> >>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, >>>>>>> and xfs. >>>>>>> Also we need to mount it with "-o dax". >>>>>>> >>>>>>> I want to use ZGC on DAX, so I want to introduce new option >>>>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as >>>>>>> backing storage. >>>>>>> What do you think this change? >>>>>> >>>>>> >>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>>>> +????????? "Allow to use filesystem as Java heap backing storage " >>>>>> ??? \ >>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>>>> + ??? \ >>>>>> >>>>>> Instead of adding a new option it would be preferable to >>>>>> automatically detect that it's a dax mounted filesystem. But I >>>>>> haven't has a chance to look into the best way of doing that. >>>>> >>>>> I thought so, but I guess it is difficult. >>>>> PMDK also does not check it automatically. >>>>> >>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>>>> >>>>> In addition, we don't seem to be able to get mount option ("-o >>>>> dax") via syscall. >>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th >>>>> argument (const void *data). It would be handled in each >>>>> filesystem, so I could not get it. >>>>> >>>>> Another solution, we can use /proc/mounts, but it might be complex. >>>> >>>> I was maybe hoping you could get this information through some >>>> ioctl() command on the file descriptor? >>> >>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in >>> fsx_xflags), but I couldn't get. >>> (I use ext4 with "-o dax") >> >> >> Ok. It would be good to get to the bottom of why it's not set. >> >> cheers, >> Per From suenaga at oss.nttdata.com Mon Feb 17 07:58:41 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 17 Feb 2020 16:58:41 +0900 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> Message-ID: Hi Per, On 2020/02/17 15:50, Per Liden wrote: > Hi, > > On 2/17/20 5:05 AM, Yasumasa Suenaga wrote: >> Hi, >> >> I filed this enhancement to JBS: >> >> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129 >> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130 > > We will not introduce a new option like this, so please withdraw the CSR (you also don't need a CSR for adding an experimental options). I withdrew it. >> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ > > Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss. I guess it is caused by Linux kernel. In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`. However `FS_XFLAG_DAX` is not handled in it. https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c#L525 Cheers, Yasumasa > cheers, > Per > >> >> Could you review this change and CSR? >> It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205). >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/02/15 2:08, Per Liden wrote: >>> Hi, >>> >>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: >>>> On 2020/02/14 23:08, Per Liden wrote: >>>>> Hi, >>>>> >>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>>>>> Hi Per, >>>>>> >>>>>> On 2020/02/14 20:52, Per Liden wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. >>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>>>>> >>>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs. >>>>>>>> Also we need to mount it with "-o dax". >>>>>>>> >>>>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. >>>>>>>> What do you think this change? >>>>>>> >>>>>>> >>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \ >>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>>>>> + ??? \ >>>>>>> >>>>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. >>>>>> >>>>>> I thought so, but I guess it is difficult. >>>>>> PMDK also does not check it automatically. >>>>>> >>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>>>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall. >>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it. >>>>>> >>>>>> Another solution, we can use /proc/mounts, but it might be complex. >>>>> >>>>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor? >>>> >>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get. >>>> (I use ext4 with "-o dax") >>> >>> >>> Ok. It would be good to get to the bottom of why it's not set. >>> >>> cheers, >>> Per From shade at redhat.com Mon Feb 17 08:12:18 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Feb 2020 09:12:18 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers In-Reply-To: <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com> References: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com> Message-ID: <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com> On 2/15/20 1:35 PM, Roman Kennke wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8239081 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/ >> >> Only some stylistic nits: >> >> *) I believe the convention is to name these boolean arguments "is_native"? >> >> *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const? > > Right, good points! Both fixed here: > > http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/ I think variables and fields should be "is_native" too. Here: 216 bool native = ShenandoahBarrierSet::use_load_reference_barrier_native(decorators, type); 217 tmp = load_reference_barrier(gen, tmp, access.resolved_addr(), native); ...and here: 255 class C1ShenandoahLoadReferenceBarrierCodeGenClosure : public StubAssemblerCodeGenClosure { 256 private: 257 const bool _native; ...and here: 89 class ShenandoahLoadReferenceBarrierStub: public CodeStub { ... 97 bool _native; ...and probably somewhere else too? -- Thanks, -Aleksey From maoliang.ml at alibaba-inc.com Mon Feb 17 09:56:08 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 17 Feb 2020 17:56:08 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>, <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> Message-ID: Hi Thomas, I am able to run specjvm2008 by excluding the compiler subtests and reproduce the issue that the change commits more memory. The main cause is addressed that the tests have a lot of humongous objects which affect the evaluation of adaptive IHOP. _last_unrestrained_young_size and _last_allocated_bytes used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are very large. So the expansion after concurrent mark is rather aggressive. I made an enhancement to restrict this uncommon expansion with MinHeapFreeRatio: http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/ Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 15 (Sat.) 03:51 To:hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 12.02.20 12:16, Thomas Schatzl wrote: > Hi Liang, > > On 12.02.20 11:17, Liang Mao wrote: >> Hi Thomas, >> >> I made a new patch for the issues we listed in JDK-8238686 and >> JDK-8236073: >> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ > > thanks. I only had time to quickly browse the change, and started > building and testing it internally. I will run it through our perf > benchmarks to look for regressions of out-of-box behavior. > > I will need a day or two until I can get back to looking at the change > in detail. There is currently something else I need to look at. Sorry. initial results from testing: - gc/g1/TestPeriodicCollection.java fails consistently because the heap does not shrink as expected (but probably this is a test bug as it may expect that uncommit occurs at remark). - memory usage tends to be significantly higher with the change without improving scores. E.g. I have been running specjvm2008 out-of-box with no settings on different machine(s) (32gb ram min), and the build with the changes almost consistently uses more heap (i.e. committed size) than without, in the range of 10% without any performance increase. Specjvm2008 benchmarks are pretty simple application in terms of behavior, i.e. does the same things all the time. This also means that very likely the current sizing is already way beyond the point of diminishing returns (actually, this is a known issue :)); I would prefer if we did not add to that. ;) Unfortunately I lost the graphs I had generated (manually), and I do not have more time available right now so can't show you right now. I started some dacapo 2009 runs (running them for 30 iterations each). Did not have time to look at the changes themselves any further or investigate the reasons for this memory usage increase than I already did earlier; will continue on Tuesday as I'm taking the day off Monday. Thanks, Thomas From per.liden at oracle.com Mon Feb 17 10:06:51 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 17 Feb 2020 11:06:51 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> Message-ID: On 2/17/20 8:58 AM, Yasumasa Suenaga wrote: > Hi Per, > > On 2020/02/17 15:50, Per Liden wrote: >> Hi, >> >> On 2/17/20 5:05 AM, Yasumasa Suenaga wrote: >>> Hi, >>> >>> I filed this enhancement to JBS: >>> >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129 >>> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130 >> >> We will not introduce a new option like this, so please withdraw the >> CSR (you also don't need a CSR for adding an experimental options). > > I withdrew it. > > >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >> >> Before this patch can go forward, you need to get to the bottom of how >> to get that ioctl command to work. If it's not possible, you need to >> explain why and propose alternatives that we can discuss. > > I guess it is caused by Linux kernel. > In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to > `struct FS_IOC_FSGETXATTR`. > However `FS_XFLAG_DAX` is not handled in it. Did a bit of googleing and it seems the DAX flag is in a bit of flux at the moment. I guess this will be fixed down the road, when DAX in the kernel becomes a non-experimental feature. How about we just do like this for now: http://cr.openjdk.java.net/~pliden/8239129/webrev.0 /Per > > > https://urldefense.com/v3/__https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c*L525__;Iw!!GqivPVa7Brio!KN3UJKZwdbjq6abJnSXLf78BAUX9742P2PJFHS6kO5_cAgG6kxQEBBBez7uFixk$ > > > Cheers, > > Yasumasa > > >> cheers, >> Per >> >>> >>> Could you review this change and CSR? >>> It passed tests on submit repo >>> (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205). >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/02/15 2:08, Per Liden wrote: >>>> Hi, >>>> >>>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: >>>>> On 2020/02/14 23:08, Per Liden wrote: >>>>>> Hi, >>>>>> >>>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>>>>>> Hi Per, >>>>>>> >>>>>>> On 2020/02/14 20:52, Per Liden wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I tried to allocate heap to DAX on Linux with >>>>>>>>> -XX:AllocateHeapAt, but it couldn't. >>>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>>>>>> >>>>>>>>> According to kernel document [1], DAX is supported in ext2, >>>>>>>>> ext4, and xfs. >>>>>>>>> Also we need to mount it with "-o dax". >>>>>>>>> >>>>>>>>> I want to use ZGC on DAX, so I want to introduce new option >>>>>>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as >>>>>>>>> backing storage. >>>>>>>>> What do you think this change? >>>>>>>> >>>>>>>> >>>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>>>>>> +????????? "Allow to use filesystem as Java heap backing storage >>>>>>>> " ??? \ >>>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>>>>>> + ??? \ >>>>>>>> >>>>>>>> Instead of adding a new option it would be preferable to >>>>>>>> automatically detect that it's a dax mounted filesystem. But I >>>>>>>> haven't has a chance to look into the best way of doing that. >>>>>>> >>>>>>> I thought so, but I guess it is difficult. >>>>>>> PMDK also does not check it automatically. >>>>>>> >>>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>>>>>> >>>>>>> In addition, we don't seem to be able to get mount option ("-o >>>>>>> dax") via syscall. >>>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th >>>>>>> argument (const void *data). It would be handled in each >>>>>>> filesystem, so I could not get it. >>>>>>> >>>>>>> Another solution, we can use /proc/mounts, but it might be complex. >>>>>> >>>>>> I was maybe hoping you could get this information through some >>>>>> ioctl() command on the file descriptor? >>>>> >>>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in >>>>> fsx_xflags), but I couldn't get. >>>>> (I use ext4 with "-o dax") >>>> >>>> >>>> Ok. It would be good to get to the bottom of why it's not set. >>>> >>>> cheers, >>>> Per From stefan.johansson at oracle.com Mon Feb 17 10:10:07 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 17 Feb 2020 11:10:07 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> Message-ID: <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com> Hi Liang, I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here? Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. Thanks, Stefan > 17 feb. 2020 kl. 10:56 skrev Liang Mao : > > Hi Thomas, > > I am able to run specjvm2008 by excluding the compiler subtests > and reproduce the issue that the change commits more memory. > The main cause is addressed that the tests have a lot of > humongous objects which affect the evaluation of adaptive > IHOP. _last_unrestrained_young_size and _last_allocated_bytes > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are > very large. So the expansion after concurrent mark is rather > aggressive. I made an enhancement to restrict this uncommon > expansion with MinHeapFreeRatio: > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/ > > Thanks, > Liang > > > > > > > ------------------------------------------------------------------ > From:Thomas Schatzl > Send Time:2020 Feb. 15 (Sat.) 03:51 > To:hotspot-gc-dev > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > Hi, > > On 12.02.20 12:16, Thomas Schatzl wrote: >> Hi Liang, >> >> On 12.02.20 11:17, Liang Mao wrote: >>> Hi Thomas, >>> >>> I made a new patch for the issues we listed in JDK-8238686 and >>> JDK-8236073: >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ >> >> thanks. I only had time to quickly browse the change, and started >> building and testing it internally. I will run it through our perf >> benchmarks to look for regressions of out-of-box behavior. >> >> I will need a day or two until I can get back to looking at the change >> in detail. There is currently something else I need to look at. Sorry. > > initial results from testing: > > - gc/g1/TestPeriodicCollection.java fails consistently because the heap > does not shrink as expected (but probably this is a test bug as it may > expect that uncommit occurs at remark). > > - memory usage tends to be significantly higher with the change without > improving scores. > > E.g. I have been running specjvm2008 out-of-box with no settings on > different machine(s) (32gb ram min), and the build with the changes > almost consistently uses more heap (i.e. committed size) than without, > in the range of 10% without any performance increase. > > Specjvm2008 benchmarks are pretty simple application in terms of > behavior, i.e. does the same things all the time. This also means that > very likely the current sizing is already way beyond the point of > diminishing returns (actually, this is a known issue :)); I would prefer > if we did not add to that. ;) > > Unfortunately I lost the graphs I had generated (manually), and I do not > have more time available right now so can't show you right now. > > I started some dacapo 2009 runs (running them for 30 iterations each). > > Did not have time to look at the changes themselves any further or > investigate the reasons for this memory usage increase than I already > did earlier; will continue on Tuesday as I'm taking the day off Monday. > > Thanks, > Thomas > From rkennke at redhat.com Mon Feb 17 11:49:48 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 17 Feb 2020 12:49:48 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers In-Reply-To: <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com> References: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com> <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com> Message-ID: <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com> Hi Aleksey, >>>> https://bugs.openjdk.java.net/browse/JDK-8239081 >>>> Webrev: >>>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/ >>> >>> Only some stylistic nits: >>> >>> *) I believe the convention is to name these boolean arguments "is_native"? >>> >>> *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const? >> >> Right, good points! Both fixed here: >> >> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/ > > I think variables and fields should be "is_native" too. > > Here: > > 216 bool native = ShenandoahBarrierSet::use_load_reference_barrier_native(decorators, type); > 217 tmp = load_reference_barrier(gen, tmp, access.resolved_addr(), native); > > ...and here: > > 255 class C1ShenandoahLoadReferenceBarrierCodeGenClosure : public StubAssemblerCodeGenClosure { > 256 private: > 257 const bool _native; > > ...and here: > > 89 class ShenandoahLoadReferenceBarrierStub: public CodeStub { > ... > 97 bool _native; > > ...and probably somewhere else too? Riiiiight.: http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.02/ (Note: where am I coming from? Java conventions for boolean properties where field is $property, getter is is$Property(), setter is set$Property().) Better now? Roman From shade at redhat.com Mon Feb 17 11:53:21 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Feb 2020 12:53:21 +0100 Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers In-Reply-To: <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com> References: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com> <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com> <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com> <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com> Message-ID: On 2/17/20 12:49 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.02/ Looks good. -- Thanks, -Aleksey From rkennke at redhat.com Mon Feb 17 12:13:03 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 17 Feb 2020 13:13:03 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> Message-ID: Hi Aleksey, >> I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I >> found that very useful and wouldn't want to throw it away. > > I think the proper way to do this is: > > #if 0 // Helpful for debugging > > ...but then I wonder, why not turn it into the actual fastdebug diagnostics? Our verifier/asserts > very helpfully include a lot of debugging info into hs_err when asserts fail. Surely if we are > chasing a very rare bug, it would be more convenient for hs_err to include that right away, not > require us recompile the VM. Right, let's leave it there for diagnostics. >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/ > > *) Looks like you can just initialize "int count = _oop_count" and skip increments in the first loop. Right. > *) Capitalization in "Must", to match the style of other asserts: > > 305 assert(nm == data->nm(), "must be same nmethod"); Ok. > *) assert(false, ...) is probably just fatal(...) Ok. http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ Good now? Thanks, Roman From shade at redhat.com Mon Feb 17 12:18:05 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Feb 2020 13:18:05 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> Message-ID: On 2/17/20 1:13 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ This is fine. Although I would probably be open for storing that diagnostics into stringStream (see how ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros: customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the stringStream and truncate parts of the data. Your call. -- Thanks, -Aleksey From suenaga at oss.nttdata.com Mon Feb 17 12:28:15 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 17 Feb 2020 21:28:15 +0900 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> Message-ID: On 2020/02/17 19:06, Per Liden wrote: > > > On 2/17/20 8:58 AM, Yasumasa Suenaga wrote: >> Hi Per, >> >> On 2020/02/17 15:50, Per Liden wrote: >>> Hi, >>> >>> On 2/17/20 5:05 AM, Yasumasa Suenaga wrote: >>>> Hi, >>>> >>>> I filed this enhancement to JBS: >>>> >>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129 >>>> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130 >>> >>> We will not introduce a new option like this, so please withdraw the CSR (you also don't need a CSR for adding an experimental options). >> >> I withdrew it. >> >> >>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>> >>> Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss. >> >> I guess it is caused by Linux kernel. >> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`. >> However `FS_XFLAG_DAX` is not handled in it. > > Did a bit of googleing and it seems the DAX flag is in a bit of flux at the moment. I guess this will be fixed down the road, when DAX in the kernel becomes a non-experimental feature. > > How about we just do like this for now: > > http://cr.openjdk.java.net/~pliden/8239129/webrev.0 I thought ZGC requires tmpfs or hugetlbfs due to performance reason. So I introduced new -XX option to make users aware of it. If not so, I agree with your change. Yasumasa > /Per > >> >> https://urldefense.com/v3/__https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c*L525__;Iw!!GqivPVa7Brio!KN3UJKZwdbjq6abJnSXLf78BAUX9742P2PJFHS6kO5_cAgG6kxQEBBBez7uFixk$ >> >> Cheers, >> >> Yasumasa >> >> >>> cheers, >>> Per >>> >>>> >>>> Could you review this change and CSR? >>>> It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205). >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/02/15 2:08, Per Liden wrote: >>>>> Hi, >>>>> >>>>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote: >>>>>> On 2020/02/14 23:08, Per Liden wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote: >>>>>>>> Hi Per, >>>>>>>> >>>>>>>> On 2020/02/14 20:52, Per Liden wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't. >>>>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs. >>>>>>>>>> >>>>>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs. >>>>>>>>>> Also we need to mount it with "-o dax". >>>>>>>>>> >>>>>>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage. >>>>>>>>>> What do you think this change? >>>>>>>>> >>>>>>>>> >>>>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \ >>>>>>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \ >>>>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \ >>>>>>>>> + ??? \ >>>>>>>>> >>>>>>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that. >>>>>>>> >>>>>>>> I thought so, but I guess it is difficult. >>>>>>>> PMDK also does not check it automatically. >>>>>>>> >>>>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ >>>>>>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall. >>>>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it. >>>>>>>> >>>>>>>> Another solution, we can use /proc/mounts, but it might be complex. >>>>>>> >>>>>>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor? >>>>>> >>>>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get. >>>>>> (I use ext4 with "-o dax") >>>>> >>>>> >>>>> Ok. It would be good to get to the bottom of why it's not set. >>>>> >>>>> cheers, >>>>> Per From rkennke at redhat.com Mon Feb 17 12:34:20 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 17 Feb 2020 13:34:20 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> Message-ID: >> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ > > This is fine. > > Although I would probably be open for storing that diagnostics into stringStream (see how > ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros: > customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the > stringStream and truncate parts of the data. Your call. Ok, let's do that then: http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/ Good? Roman From zgu at redhat.com Mon Feb 17 14:03:28 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 17 Feb 2020 09:03:28 -0500 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> Message-ID: <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com> On 2/17/20 7:34 AM, Roman Kennke wrote: >>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ >> >> This is fine. >> >> Although I would probably be open for storing that diagnostics into stringStream (see how >> ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros: >> customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the >> stringStream and truncate parts of the data. Your call. > > > Ok, let's do that then: > > http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/ > > Good? assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in definition), does not need nested ifdef ASSERT ... -Zhengyu > > Roman > From rkennke at redhat.com Mon Feb 17 15:27:05 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 17 Feb 2020 16:27:05 +0100 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com> References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com> Message-ID: <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com> >>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ >>> >>> This is fine. >>> >>> Although I would probably be open for storing that diagnostics into >>> stringStream (see how >>> ShenandoahAsserts::print_failure does it), and putting it into the >>> fatal message itself. Pros: >>> customers would hand over hs_errs to us with the relevant >>> diagnostics. Cons: we can overflow the >>> stringStream and truncate parts of the data. Your call. >> >> >> Ok, let's do that then: >> >> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/ >> >> Good? > > assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in > definition), does not need nested ifdef ASSERT ... Right! Very good catch! http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.03/ Good now? Thanks for reviewing! Roman From david.holmes at oracle.com Tue Feb 18 05:23:46 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Feb 2020 15:23:46 +1000 Subject: RFR: add parallel heap inspection support for jmap histo(G1) In-Reply-To: <11bca96c0e7745f5b2558cc49b42b996@tencent.com> References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com> Message-ID: Hi Lin, Adding in hotspot-gc-dev as they need to see how this interacts with GC worker threads, and whether it needs to be extended beyond G1. I happened to spot one nit when browsing: src/hotspot/share/gc/shared/collectedHeap.hpp + virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, + BoolObjectClosure* filter, + size_t* missed_count, + size_t thread_num) { + return NULL; s/NULL/false/ Cheers, David On 18/02/2020 2:15 pm, linzang(??) wrote: > Dear All, > ? ? ?May I ask your help to review the follow changes: > ? ? ?webrev: > http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ > ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. > ? ? ?my simple test shown it can speed up 2x of jmap -histo with > parallelThreadNum set to 2 for heap at ~500M on 4-core platform. > > ------------------------------------------------------------------------ > BRs, > Lin From linzang at tencent.com Tue Feb 18 06:29:38 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Tue, 18 Feb 2020 06:29:38 +0000 Subject: RFR: add parallel heap inspection support for jmap histo(G1)(Internet mail) References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>, Message-ID: Dear David,? ? ? ? Thanks a lot! ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.? ? ? Thanks, -------------- Lin >Hi Lin, > >Adding in hotspot-gc-dev as they need to see how this interacts with GC >worker threads, and whether it needs to be extended beyond G1. > >I happened to spot one nit when browsing: > >src/hotspot/share/gc/shared/collectedHeap.hpp > >+?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, >+????????????????????????????????????????? BoolObjectClosure* filter, >+????????????????????????????????????????? size_t* missed_count, >+????????????????????????????????????????? size_t thread_num) { >+???? return NULL; > >s/NULL/false/ > >Cheers, >David > >On 18/02/2020 2:15 pm, linzang(??) wrote: >> Dear All, >>? ? ? ?May I ask your help to review the follow changes: >>? ? ? ?webrev: >> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ >> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >>? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. >>? ? ? ?my simple test shown it can speed up 2x of jmap -histo with >> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. >> >> ------------------------------------------------------------------------ >> BRs, >> Lin > From maoliang.ml at alibaba-inc.com Tue Feb 18 07:27:38 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 18 Feb 2020 15:27:38 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> , <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com> Message-ID: <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com> Hi Stefan, Thank you for your comments! Based on previous discussion, the reasons are as below: 1) For the expansion after cm, I think we have the agreement that original MinHeapFreeRatio might be too large and predicting the necessary size from adaptive IHOP for expansion sounds reasonable and specjbb2015 have the good result. 2) About when to shrink the heap, I think it's a better spot after mixed collections. From my observation, the heap use is still at nearly peak after remark for most of cases like Alibaba workloads and specjbb2015. There could be some senario which contains a lot of humongous regions that remark will cleanup considerable regions. But why don't we decide to shrink the heap size when most of garbages have been cleaned after mixed GCs. We don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to keep heap capacity with 30% live objects make sence and is unified with full gc logic. If we only shrink the heap at remark, the maximum desired capacity could be 100/30 times of peak heap usage which is obviously not efficient. Thanks, Liang ------------------------------------------------------------------ From:Stefan Johansson Send Time:2020 Feb. 17 (Mon.) 18:10 To:"MAO, Liang" Cc:hotspot-gc-dev ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Liang, I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here? Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. Thanks, Stefan > 17 feb. 2020 kl. 10:56 skrev Liang Mao : > > Hi Thomas, > > I am able to run specjvm2008 by excluding the compiler subtests > and reproduce the issue that the change commits more memory. > The main cause is addressed that the tests have a lot of > humongous objects which affect the evaluation of adaptive > IHOP. _last_unrestrained_young_size and _last_allocated_bytes > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are > very large. So the expansion after concurrent mark is rather > aggressive. I made an enhancement to restrict this uncommon > expansion with MinHeapFreeRatio: > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/ > > Thanks, > Liang > > > > > > > ------------------------------------------------------------------ > From:Thomas Schatzl > Send Time:2020 Feb. 15 (Sat.) 03:51 > To:hotspot-gc-dev > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > Hi, > > On 12.02.20 12:16, Thomas Schatzl wrote: >> Hi Liang, >> >> On 12.02.20 11:17, Liang Mao wrote: >>> Hi Thomas, >>> >>> I made a new patch for the issues we listed in JDK-8238686 and >>> JDK-8236073: >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ >> >> thanks. I only had time to quickly browse the change, and started >> building and testing it internally. I will run it through our perf >> benchmarks to look for regressions of out-of-box behavior. >> >> I will need a day or two until I can get back to looking at the change >> in detail. There is currently something else I need to look at. Sorry. > > initial results from testing: > > - gc/g1/TestPeriodicCollection.java fails consistently because the heap > does not shrink as expected (but probably this is a test bug as it may > expect that uncommit occurs at remark). > > - memory usage tends to be significantly higher with the change without > improving scores. > > E.g. I have been running specjvm2008 out-of-box with no settings on > different machine(s) (32gb ram min), and the build with the changes > almost consistently uses more heap (i.e. committed size) than without, > in the range of 10% without any performance increase. > > Specjvm2008 benchmarks are pretty simple application in terms of > behavior, i.e. does the same things all the time. This also means that > very likely the current sizing is already way beyond the point of > diminishing returns (actually, this is a known issue :)); I would prefer > if we did not add to that. ;) > > Unfortunately I lost the graphs I had generated (manually), and I do not > have more time available right now so can't show you right now. > > I started some dacapo 2009 runs (running them for 30 iterations each). > > Did not have time to look at the changes themselves any further or > investigate the reasons for this memory usage increase than I already > did earlier; will continue on Tuesday as I'm taking the day off Monday. > > Thanks, > Thomas > From stefan.johansson at oracle.com Tue Feb 18 09:16:56 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 18 Feb 2020 10:16:56 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com> References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com> <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com> Message-ID: <3D9F54B4-57AE-4328-B95D-12669EEBA6C7@oracle.com> Hi Liang, I?ve also recently been looking at shrinking the heap after the Mixed collections. I totally agree that we should try to uncommit at this point, since the usage should be the lowest. I?m however not convinced that we should only uncommit once. My findings the last time, and what I?m seeing with your patch is some very long pauses when doing the uncommit. To try to avoid those I started looking at doing the uncommit concurrently, but didn?t find enough time to really dig into the details around that. An other thing to investigate would be the suggestion in: https://bugs.openjdk.java.net/browse/JDK-8210709 My main point is that we need to ensure that uncommitting memory don?t come with a to high cost. Thanks, Stefan > 18 feb. 2020 kl. 08:27 skrev Liang Mao : > > > Hi Stefan, > > Thank you for your comments! > > Based on previous discussion, the reasons are as below: > 1) For the expansion after cm, I think we have the agreement that > original MinHeapFreeRatio might be too large and predicting the necessary > size from adaptive IHOP for expansion sounds reasonable and specjbb2015 > have the good result. > 2) About when to shrink the heap, I think it's a better spot after > mixed collections. From my observation, the heap use is still at nearly > peak after remark for most of cases like Alibaba workloads and specjbb2015. > There could be some senario which contains a lot of humongous regions that > remark will cleanup considerable regions. But why don't we decide to shrink > the heap size when most of garbages have been cleaned after mixed GCs. We > don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to > keep heap capacity with 30% live objects make sence and is unified with full > gc logic. If we only shrink the heap at remark, the maximum desired capacity > could be 100/30 times of peak heap usage which is obviously not efficient. > > Thanks, > Liang > > > > > ------------------------------------------------------------------ > From:Stefan Johansson > Send Time:2020 Feb. 17 (Mon.) 18:10 > To:"MAO, Liang" > Cc:hotspot-gc-dev ; hotspot-gc-dev > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > Hi Liang, > > I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here? > > Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. > > Thanks, > Stefan > > > 17 feb. 2020 kl. 10:56 skrev Liang Mao : > > > > Hi Thomas, > > > > I am able to run specjvm2008 by excluding the compiler subtests > > and reproduce the issue that the change commits more memory. > > The main cause is addressed that the tests have a lot of > > humongous objects which affect the evaluation of adaptive > > IHOP. _last_unrestrained_young_size and _last_allocated_bytes > > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are > > very large. So the expansion after concurrent mark is rather > > aggressive. I made an enhancement to restrict this uncommon > > expansion with MinHeapFreeRatio: > > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/ > > > > Thanks, > > Liang > > > > > > > > > > > > > > ------------------------------------------------------------------ > > From:Thomas Schatzl > > Send Time:2020 Feb. 15 (Sat.) 03:51 > > To:hotspot-gc-dev > > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > > > Hi, > > > > On 12.02.20 12:16, Thomas Schatzl wrote: > >> Hi Liang, > >> > >> On 12.02.20 11:17, Liang Mao wrote: > >>> Hi Thomas, > >>> > >>> I made a new patch for the issues we listed in JDK-8238686 and > >>> JDK-8236073: > >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ > >> > >> thanks. I only had time to quickly browse the change, and started > >> building and testing it internally. I will run it through our perf > >> benchmarks to look for regressions of out-of-box behavior. > >> > >> I will need a day or two until I can get back to looking at the change > >> in detail. There is currently something else I need to look at. Sorry. > > > > initial results from testing: > > > > - gc/g1/TestPeriodicCollection.java fails consistently because the heap > > does not shrink as expected (but probably this is a test bug as it may > > expect that uncommit occurs at remark). > > > > - memory usage tends to be significantly higher with the change without > > improving scores. > > > > E.g. I have been running specjvm2008 out-of-box with no settings on > > different machine(s) (32gb ram min), and the build with the changes > > almost consistently uses more heap (i.e. committed size) than without, > > in the range of 10% without any performance increase. > > > > Specjvm2008 benchmarks are pretty simple application in terms of > > behavior, i.e. does the same things all the time. This also means that > > very likely the current sizing is already way beyond the point of > > diminishing returns (actually, this is a known issue :)); I would prefer > > if we did not add to that. ;) > > > > Unfortunately I lost the graphs I had generated (manually), and I do not > > have more time available right now so can't show you right now. > > > > I started some dacapo 2009 runs (running them for 30 iterations each). > > > > Did not have time to look at the changes themselves any further or > > investigate the reasons for this memory usage increase than I already > > did earlier; will continue on Tuesday as I'm taking the day off Monday. > > > > Thanks, > > Thomas > > > From thomas.schatzl at oracle.com Tue Feb 18 10:01:35 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 11:01:35 +0100 Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator overloads In-Reply-To: References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com> <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com> Message-ID: Hi Jiangli, Kim, Ioi, thanks for your review. On 15.02.20 00:14, Jiangli Zhou wrote: > On Fri, Feb 14, 2020 at 3:05 PM Kim Barrett wrote: >> >>> On Feb 14, 2020, at 10:05 AM, Thomas Schatzl wrote: >>> >>> Hi all, >>> >>> can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion. >>>[...] >> ------------------------------------------------------------------------------ >> src/hotspot/share/memory/memRegion.hpp >> 96 // Creates and initializes an array of MemRegions of the given length. >> 97 static MemRegion* create(uint length, MEMFLAGS flags); >> >> A function named "create" suggests to me creating a single object, not >> an array. Perhaps "make_array" or "create_array" or "new_array"? > > +1. I had the same thoughts when looking at the webrev.1. > > Best regards, > Jiangli I pushed with "create_array"; for reference, the webrevs: http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1_to_2/ (diff( http://cr.openjdk.java.net/~tschatzl/8238999/webrev.2/ (full) Thanks, Thomas From thomas.schatzl at oracle.com Tue Feb 18 10:19:07 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 11:19:07 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com> <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com> <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com> Message-ID: <796daaad-c07b-3ad0-7fd5-d8b5bd77c7c3@oracle.com> Hi Liang, On 17.02.20 07:03, Liang Mao wrote: > Hi Thomas, > >> - gc/g1/TestPeriodicCollection.java fails consistently because the heap >> does not shrink as expected (but probably this is a test bug as it may >> expect that uncommit occurs at remark). > > The reason should be that the patch makes shrinking after mixed GC > but the mixed gc doesn't happen. It's the only issue I listed for > the change. > I agree, but we still need to fix the test in some way ;) >> - memory usage tends to be significantly higher with the change without >> improving scores. > >> E.g. I have been running specjvm2008 out-of-box with no settings on >> different machine(s) (32gb ram min), and the build with the changes >> almost consistently uses more heap (i.e. committed size) than without, >> in the range of 10% without any performance increase. > >> Specjvm2008 benchmarks are pretty simple application in terms of >> behavior, i.e. does the same things all the time. This also means that >> very likely the current sizing is already way beyond the point of >> diminishing returns (actually, this is a known issue :)); I would prefer >> if we did not add to that. ;) > > I have 2 questions here. > 1) specjvm2008 cannot run with jdk9+: > https://bugs.openjdk.java.net/browse/JDK-8202460 > I face the same problem. Do you have any way to perform the test > in JDK15? > Just for reference for others, they (except compiler.compiler, but they do not work since jdk8) can be made working with the following options --add-exports=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED --add-exports=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED as described in the specjvm2008 faq q4.8 [0] > 2) I didn't understand : "This also means that >> very likely the current sizing is already way beyond the point of >> diminishing returns (actually, this is a known issue :));" > Could you please explain more about this? Although I think you already got what I meant: current heap sizing without the patch already by default sizes the heap too large, i.e. the same scores could be achieved using less heap. The change now increases the heap even more (obviously without increasing the performance either). This seems undesirable. I saw in the other email that you proposed a fix for that already, I will look into this. Thanks, Thomas [0] https://www.spec.org/jvm2008/docs/FAQ.html#Q4.8 From maoliang.ml at alibaba-inc.com Tue Feb 18 11:03:31 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 18 Feb 2020 19:03:31 +0800 Subject: =?UTF-8?B?Q01TLWNvbmN1cnJlbnQtc3dlZXAgc3BlbnQgZXh0cmVtbHkgbG9uZyB0aW1lIHdpdGggOHU=?= Message-ID: Hi, I saw an very unusual scenario that concurrent sweep cost extremly long time. After a long time concurrent mode failure, the GC was recovered. In the previous CMS cycle at 05:32, sweep worked fine that we could see old gen occupancy continued to decrease until sweep completed. After the problematic sweep started, the old gen occupancy dropped very slowly in the early stage and then increased to promotion failure. The young GC slowed 30x at that time as well. 2020-02-18T05:32:12.740+0800: 14277958.735: [CMS-concurrent-sweep-start] 2020-02-18T05:32:13.178+0800: 14277959.173: [GC (Allocation Failure) 2020-02-18T05:32:13.178+0800: 14277959.173: [ParNew 247829K->126882K(3145728K), 0.0259766 secs] 57679525K->55581737K(82837504K), 0.0266680 secs] [Times: user=0.31 sys=0.00, real=0.03 secs] 2020-02-18T05:32:13.764+0800: 14277959.759: [GC (Allocation Failure) 2020-02-18T05:32:13.764+0800: 14277959.759: [ParNew 2224034K->100365K(3145728K), 0.0138495 secs] 56901617K->54782226K(82837504K), 0.0145248 secs] [Times: user=0.30 sys=0.00, real=0.01 secs] 2020-02-18T05:32:14.357+0800: 14277960.353: [GC (Allocation Failure) 2020-02-18T05:32:14.358+0800: 14277960.353: [ParNew 2197517K->107850K(3145728K), 0.0129100 secs] 56173691K->54088043K(82837504K), 0.0135664 secs] [Times: user=0.28 sys=0.00, real=0.01 secs] ... 2020-02-18T05:32:24.655+0800: 14277970.650: [GC (Allocation Failure) 2020-02-18T05:32:24.655+0800: 14277970.650: [ParNew 2218555K->137181K(3145728K), 0.0127018 secs] 45316426K->43236019K(82837504K), 0.0133486 secs] [Times: user=0.26 sys=0.00, real=0.01 secs] ... 2020-02-18T05:32:31.642+0800: 14277977.637: [GC (Allocation Failure) 2020-02-18T05:32:31.642+0800: 14277977.637: [ParNew 2257551K->159124K(3145728K), 0.0135751 secs] 38026885K->35932714K(82837504K), 0.0142222 secs] [Times: user=0.28 sys=0.00, real=0.02 secs] ... 2020-02-18T05:32:36.630+0800: 14277982.625: [GC (Allocation Failure) 2020-02-18T05:32:36.630+0800: 14277982.625: [ParNew 2232867K->134026K(3145728K), 0.0132394 secs] 32928275K->30837706K(82837504K), 0.0138968 secs] [Times: user=0.27 sys=0.00, real=0.02 secs] 2020-02-18T05:32:36.812+0800: 14277982.807: [CMS-concurrent-sweep: 23.250/24.072 secs] [Times: user=137.69 sys=0.00, real=24.07 secs] ... 2020-02-18T06:31:11.378+0800: 14281497.373: [CMS-concurrent-sweep-start] 2020-02-18T06:31:14.862+0800: 14281500.857: [GC (Allocation Failure) 2020-02-18T06:31:14.862+0800: 14281500.857: [ParNew 2925648K->871894K(3145728K), 0.3871002 secs] 58747945K->56735151K(82837504K), 0.3877653 secs] [Times: user=0.84 sys=0.00, real=0.39 secs] 2020-02-18T06:31:18.103+0800: 14281504.098: [GC (Allocation Failure) 2020-02-18T06:31:18.103+0800: 14281504.098: [ParNew 2969046K->828180K(3145728K), 0.4009765 secs] 58768864K->56670502K(82837504K), 0.4016504 secs] [Times: user=0.86 sys=0.00, real=0.40 secs] ... 2020-02-18T06:31:53.530+0800: 14281539.525: [GC (Allocation Failure) 2020-02-18T06:31:53.530+0800: 14281539.525: [ParNew 2884806K->790526K(3145728K), 0.4082761 secs] 58450942K->56399496K(82837504K), 0.4089053 secs] [Times: user=0.86 sys=0.00, real=0.41 secs] 2020-02-18T06:31:56.913+0800: 14281542.908: [GC (Allocation Failure) 2020-02-18T06:31:56.914+0800: 14281542.909: [ParNew 2887678K->830520K(3145728K), 0.3762305 secs] 58449269K->56431129K(82837504K), 0.3768546 secs] [Times: user=0.80 sys=0.00, real=0.37 secs] ... 2020-02-18T06:39:10.412+0800: 14281976.407: [GC (Allocation Failure) 2020-02-18T06:39:10.412+0800: 14281976.407: [ParNew 2912121K->831832K(3145728K), 0.3765129 secs] 55456571K->53415654K(82837504K), 0.3771153 secs] [Times: user=0.90 sys=0.00, real=0.38 secs] ... 2020-02-18T06:47:16.462+0800: 14282462.457: [GC (Allocation Failure) 2020-02-18T06:47:16.463+0800: 14282462.458: [ParNew 2931678K->872899K(3145728K), 0.3223769 secs] 56740808K->54718072K(82837504K), 0.3229753 secs] [Times: user=0.76 sys=0.00, real=0.32 secs] ... 2020-02-18T06:55:36.434+0800: 14282962.429: [GC (Allocation Failure) 2020-02-18T06:55:36.434+0800: 14282962.429: [ParNew 2932041K->837434K(3145728K), 0.3941144 secs] 59917770K->57864311K(82837504K), 0.3947332 secs] [Times: user=0.86 sys=0.00, real=0.39 secs] ... 2020-02-18T07:02:02.007+0800: 14283348.002: [GC (Allocation Failure) 2020-02-18T07:02:02.008+0800: 14283348.003: [ParNew 2871413K->847657K(3145728K), 0.3542050 secs] 68308980K->66321780K(82837504K), 0.3549298 secs] [Times: user=0.79 sys=0.00, real=0.36 secs] ... 2020-02-18T07:05:35.166+0800: 14283561.161: [GC (Allocation Failure) 2020-02-18T07:05:35.166+0800: 14283561.161: [ParNew 2859132K->870624K(3145728K), 0.3499013 secs] 75159088K->73210284K(82837504K), 0.3505328 secs] [Times: user=0.79 sys=0.00, real=0.35 secs] 2020-02-18T07:05:36.418+0800: 14283562.413: [GC (Allocation Failure) 2020-02-18T07:05:36.419+0800: 14283562.414: [ParNew (0: promotion failure size = 3286) (1: promotion failure size = 3288) (2: promotion failure size = 3285) (3: promotion failure size = 3286) (4: promotion failure size = 3286) (5: promotion failure size = 3288) (6: promotion failure size = 3290) (7: promotion failure size = 3286) (8: promotion failure size = 3286) (9: promotion failure size = 3287) (10: promotion failure size = 3292) (11: promotion failure size = 3286) (12: promotion failure size = 3285) (13: promotion failure size = 3288) (14: promotion failure size = 3286) (15: promotion failure size = 3286) (16: promotion failure size = 3287) (17: promotion failure size = 3287) (18: promotion failure size = 3285) (19: promotion failure size = 3286) (20: promotion failure size = 3285) (21: promotion failure size = 3289) (22: promotion failure size = 3289) (23: promotion failure size = 3286) (promotion failed): 2967776K->3077198K(3145728K), 0.8107304 secs]2020-02-18T07:05:37.229+0800: 14283563.224: [CMS2020-02-18T07:06:57.395+0800: 14283643.390: [CMS-concurrent-sweep: 1752.431/2146.017 secs] [Times: user=8483.92 sys=0.00, real=2146.02 secs] (concurrent mode failure) 72370103K->31883050K(79691776K), 95.8957007 secs] 75293380K->31883050K(82837504K), [Metaspace: 59347K->59341K(61440K)], 96.7139359 secs] [Times: user=97.25 sys=0.00, real=96.72 secs] 2020-02-18T07:08:04.605+0800: 14283710.600: [GC (Allocation Failure) 2020-02-18T07:08:04.605+0800: 14283710.600: [ParNew 2097152K->361723K(3145728K), 0.0195181 secs] 33980202K->32244773K(82837504K), 0.0198080 secs] [Times: user=0.36 sys=0.00, real=0.02 secs] There're no any suspicious GC options: -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSScavengeBeforeRemark -XX:GCLogFileSize=209715200 -XX:InitialHeapSize=85899345920 -XX:MaxHeapSize=85899345920 -XX:MaxNewSize=4294967296 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=4294967296 -XX:NumberOfGCLogFiles=5 -XX:OldPLABSize=16 -XX:OldSize=81604378624 -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=kill -9 %p -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=24 -XX:-ParallelRefProcEnabled -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCRootsTraceTime -XX:+PrintGCTimeStamps -XX:+PrintPromotionFailure -XX:+PrintReferenceGC -XX:SurvivorRatio=2 -XX:+UnlockDiagnosticVMOptions -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC The instance has been running for months. Does anybody know if there is a specific bug? Or it's just because of the fragment issue of CMS? Thanks, Liang From maoliang.ml at alibaba-inc.com Tue Feb 18 12:48:56 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 18 Feb 2020 20:48:56 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: Hi Stefan, I don't think we need an earlier shrink if we are trying to do it just a bit later after mixed GCs. For the concurrent uncommit, I already had a patch http://cr.openjdk.java.net/~luchsh/8236073.webrev/ But need spend sometime to refine it according to Thomas' comments. Thanks, Liang ------------------------------------------------------------------ From:Stefan Johansson Send Time:2020 Feb. 18 (Tue.) 17:17 To:"MAO, Liang" Cc:hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Liang, I?ve also recently been looking at shrinking the heap after the Mixed collections. I totally agree that we should try to uncommit at this point, since the usage should be the lowest. I?m however not convinced that we should only uncommit once. My findings the last time, and what I?m seeing with your patch is some very long pauses when doing the uncommit. To try to avoid those I started looking at doing the uncommit concurrently, but didn?t find enough time to really dig into the details around that. An other thing to investigate would be the suggestion in: https://bugs.openjdk.java.net/browse/JDK-8210709 My main point is that we need to ensure that uncommitting memory don?t come with a to high cost. Thanks, Stefan > 18 feb. 2020 kl. 08:27 skrev Liang Mao : > > > Hi Stefan, > > Thank you for your comments! > > Based on previous discussion, the reasons are as below: > 1) For the expansion after cm, I think we have the agreement that > original MinHeapFreeRatio might be too large and predicting the necessary > size from adaptive IHOP for expansion sounds reasonable and specjbb2015 > have the good result. > 2) About when to shrink the heap, I think it's a better spot after > mixed collections. From my observation, the heap use is still at nearly > peak after remark for most of cases like Alibaba workloads and specjbb2015. > There could be some senario which contains a lot of humongous regions that > remark will cleanup considerable regions. But why don't we decide to shrink > the heap size when most of garbages have been cleaned after mixed GCs. We > don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to > keep heap capacity with 30% live objects make sence and is unified with full > gc logic. If we only shrink the heap at remark, the maximum desired capacity > could be 100/30 times of peak heap usage which is obviously not efficient. > > Thanks, > Liang > > > > > ------------------------------------------------------------------ > From:Stefan Johansson > Send Time:2020 Feb. 17 (Mon.) 18:10 > To:"MAO, Liang" > Cc:hotspot-gc-dev ; hotspot-gc-dev > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > Hi Liang, > > I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here? > > Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. > > Thanks, > Stefan > > > 17 feb. 2020 kl. 10:56 skrev Liang Mao : > > > > Hi Thomas, > > > > I am able to run specjvm2008 by excluding the compiler subtests > > and reproduce the issue that the change commits more memory. > > The main cause is addressed that the tests have a lot of > > humongous objects which affect the evaluation of adaptive > > IHOP. _last_unrestrained_young_size and _last_allocated_bytes > > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are > > very large. So the expansion after concurrent mark is rather > > aggressive. I made an enhancement to restrict this uncommon > > expansion with MinHeapFreeRatio: > > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/ > > > > Thanks, > > Liang > > > > > > > > > > > > > > ------------------------------------------------------------------ > > From:Thomas Schatzl > > Send Time:2020 Feb. 15 (Sat.) 03:51 > > To:hotspot-gc-dev > > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > > > Hi, > > > > On 12.02.20 12:16, Thomas Schatzl wrote: > >> Hi Liang, > >> > >> On 12.02.20 11:17, Liang Mao wrote: > >>> Hi Thomas, > >>> > >>> I made a new patch for the issues we listed in JDK-8238686 and > >>> JDK-8236073: > >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/ > >> > >> thanks. I only had time to quickly browse the change, and started > >> building and testing it internally. I will run it through our perf > >> benchmarks to look for regressions of out-of-box behavior. > >> > >> I will need a day or two until I can get back to looking at the change > >> in detail. There is currently something else I need to look at. Sorry. > > > > initial results from testing: > > > > - gc/g1/TestPeriodicCollection.java fails consistently because the heap > > does not shrink as expected (but probably this is a test bug as it may > > expect that uncommit occurs at remark). > > > > - memory usage tends to be significantly higher with the change without > > improving scores. > > > > E.g. I have been running specjvm2008 out-of-box with no settings on > > different machine(s) (32gb ram min), and the build with the changes > > almost consistently uses more heap (i.e. committed size) than without, > > in the range of 10% without any performance increase. > > > > Specjvm2008 benchmarks are pretty simple application in terms of > > behavior, i.e. does the same things all the time. This also means that > > very likely the current sizing is already way beyond the point of > > diminishing returns (actually, this is a known issue :)); I would prefer > > if we did not add to that. ;) > > > > Unfortunately I lost the graphs I had generated (manually), and I do not > > have more time available right now so can't show you right now. > > > > I started some dacapo 2009 runs (running them for 30 iterations each). > > > > Did not have time to look at the changes themselves any further or > > investigate the reasons for this memory usage increase than I already > > did earlier; will continue on Tuesday as I'm taking the day off Monday. > > > > Thanks, > > Thomas > > > From zgu at redhat.com Tue Feb 18 12:54:32 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 18 Feb 2020 07:54:32 -0500 Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification In-Reply-To: <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com> References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com> <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com> <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com> <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com> Message-ID: <0da39bd5-8fde-b010-0c19-8973489b6abb@redhat.com> On 2/17/20 10:27 AM, Roman Kennke wrote: >>>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/ >>>> >>>> This is fine. >>>> >>>> Although I would probably be open for storing that diagnostics into >>>> stringStream (see how >>>> ShenandoahAsserts::print_failure does it), and putting it into the >>>> fatal message itself. Pros: >>>> customers would hand over hs_errs to us with the relevant >>>> diagnostics. Cons: we can overflow the >>>> stringStream and truncate parts of the data. Your call. >>> >>> >>> Ok, let's do that then: >>> >>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/ >>> >>> Good? >> >> assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in >> definition), does not need nested ifdef ASSERT ... > > > Right! Very good catch! > > http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.03/ > > Good now? > Yes, good to me. Thanks, -Zhengyu > Thanks for reviewing! > > Roman > From thomas.schatzl at oracle.com Tue Feb 18 14:17:21 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 15:17:21 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: Message-ID: <7e7c946e-b35b-54e1-8fc2-53fcf0df5b48@oracle.com> Hi Liang, Stefan, let me summarize the current point of discussion a bit, because I believe there are some subtle misunderstandings. On 18.02.20 13:48, Liang Mao wrote: > Hi Stefan, > > I don't think we need an earlier shrink if we are trying to do it just a > bit later after mixed GCs. For the concurrent uncommit, I already had > a patch http://cr.openjdk.java.net/~luchsh/8236073.webrev/ That's fine, and Stefan agrees too. Let's keep these two separate. These changes can even be pushed in a single push if necessary, but I do not think so. Thanks a lot for your really quick responses, we really appreciate your effort. > But need spend some time to refine it according to Thomas' comments. Please give us a day to look at the current change (.4) in more detail and allow us to respond in a more coherent fashion too :) We also would like to do some short tests which take some time to suggest (hopefully) the best opportunities where/what to improve. > ------------------------------------------------------------------ > From:Stefan Johansson > Send Time:2020 Feb. 18 (Tue.) 17:17 > To:"MAO, Liang" > Cc:hotspot-gc-dev > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > > Hi?Liang, > > I?ve?also?recently?been?looking?at?shrinking?the?heap?after?the?Mixed?collections.?I?totally?agree?that?we?should?try?to?uncommit?at?this?point,?since?the?usage?should?be?the?lowest.?I?m?however?not?convinced?that?we?should?only?uncommit?once.?My?findings?the?last?time,?and?what?I?m?seeing?with?your?patch?is?some?very?long?pauses?when?doing?the?uncommit.?To?try?to?avoid?those?I?started?looking?at?doing?the?uncommit?concurrently,?but?didn?t?find?enough?time?to?really?dig?into?the?details?around?that.?An?other?thing?to?investigate?would?be?the?suggestion?in: > https://bugs.openjdk.java.net/browse/JDK-8210709 > > My?main?point?is?that?we?need?to?ensure?that?uncommitting?memory?don?t?come?with?a?to?high?cost. > > Thanks, > Stefan > > > >?18?feb.?2020?kl.?08:27?skrev?Liang?Mao?: > > > > > >?Hi?Stefan, > > > >?Thank?you?for?your?comments! > > > >?Based?on?previous?discussion,?the?reasons?are?as?below: > >?1)?For?the?expansion?after?cm,?I?think?we?have?the?agreement?that > >?original?MinHeapFreeRatio?might?be?too?large?and?predicting?the?necessary > >?size?from?adaptive?IHOP?for?expansion?sounds?reasonable?and?specjbb2015 > >?have?the?good?result. (Being aware that I am ignoring above comment about premature comments two seconds later, but no new comments here, only an attempt on clarification :( ) I think Stefan wanted to ask why the heuristic _expands_ at Cleanup at all. There does not seem to be need to do that at that time given that at the end of mixed gc we resize the heap "optimally" anyway. Expansion at Cleanup (or Remark) seems to be not desired, so not doing anything might be the best option here. At worst G1 will intermittently expand automatically at one of the GCs between Cleanup pause and last mixed gc. There may be issues with that idea. > >?2)?About?when?to?shrink?the?heap,?I?think?it's?a?better?spot?after > >?mixed?collections.?From?my?observation,?the?heap?use?is?still?at?nearly > >?peak?after?remark?for?most?of?cases?like?Alibaba?workloads?and?specjbb2015. > >?There?could?be?some?senario?which?contains?a?lot?of?humongous?regions?that > >?remark?will?cleanup?considerable?regions.?But?why?don't?we?decide?to?shrink > >?the?heap?size?when?most?of?garbages?have?been?cleaned?after?mixed?GCs.?We > >?don't?need?to?shrink?twice?in?an?old?gc?cycle.?The?MaxHeapFreeRatio?70?to > >?keep?heap?capacity?with?30%?live?objects?make?sence?and?is?unified?with?full > >?gc?logic.?If?we?only?shrink?the?heap?at?remark,?the?maximum?desired?capacity > >?could?be?100/30?times?of?peak?heap?usage?which?is?obviously?not?efficient. > > I think Stefan suggests to shrink both at Remark (but do not expand then), particularly if we happen to really have a lot of free data now. Then refine that result at the last mixed gc. I.e. not let the user wait that long. While that has disadvantages like you mentioned about maybe doing the uncommit twice in a single cycle, the increased responsiveness of the application due to memory demands elsewhere might be more important. How much to shrink? My opinion would be to only shrink if there is a huge amount of free memory at this point, i.e. keep that rule simple and do the more exact heuristics later (like only considering MaxHeapFreeRatio) My opinion is to free unused memory asap, so I have a slight preference towards also uncommitting during Remark. However that can be added later too. Thanks, Thomas From thomas.schatzl at oracle.com Tue Feb 18 16:03:41 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 17:03:41 +0100 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> Message-ID: <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com> Hi, On 15.02.20 09:20, Kim Barrett wrote: >> On Feb 14, 2020, at 11:08 AM, Ioi Lam wrote: >> >> Hi Thomas, >> >> Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place. > > Rather than splitting up the function, one could add a local cleanup handler: > > ... create and initialize regions object ... > struct Cleanup { > MemRegion* _regions; > bool _aborted; > Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {} > ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); } > } cleanup(regions); > ... > cleanup._aborted = false; > return true; > } > I implemented that as it is least intrusive in the end. http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point in providing diff) Thanks, Thomas From zgu at redhat.com Tue Feb 18 16:52:48 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 18 Feb 2020 11:52:48 -0500 Subject: [15] RFR 8239354: Shenandoah: minor enhancements to traversal GC Message-ID: 1) Added assertion to catch evacuation after completion of heap traversal. This should help catch the bug demonstrated in sh-jdk11 w/o JDK-8237396. 2) Retire TLAB/GCLAB after completion of heap traversal. Current code retires TLAB/GCLAB at the beginning final traversal, but STW traversal still uses GCLAB to evacuate remaining objects. 3) Added comments regarding why need to retire TLAB/GCLAB, even we don't need heap to be parsable. Bug: https://bugs.openjdk.java.net/browse/JDK-8239354 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239354/webrev.00/index.html Test: hotspot_gc_shenandoah Thanks, -Zhengyu From ioi.lam at oracle.com Tue Feb 18 18:00:28 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 18 Feb 2020 10:00:28 -0800 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com> Message-ID: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com> The changes look OK to me. I think this line doesn't need to be changed. 1820?? *heap_mem = cleanup._regions; Thanks - Ioi On 2/18/20 8:03 AM, Thomas Schatzl wrote: > Hi, > > On 15.02.20 09:20, Kim Barrett wrote: >>> On Feb 14, 2020, at 11:08 AM, Ioi Lam wrote: >>> >>> Hi Thomas, >>> >>> Thanks for fixing this issue. Freeing the array at each exit point >>> seems error prone. How about: refactoring the function to a >>> FileMapInfo::map_heap_data_impl function, allocate inside >>> FileMapInfo::map_heap_data(), call map_heap_data() and if it returns >>> false, free the array in a single place. >> >> Rather than splitting up the function, one could add a local cleanup >> handler: >> >> ?? ... create and initialize regions object ... >> ?? struct Cleanup { >> ???? MemRegion* _regions; >> ???? bool _aborted; >> ???? Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {} >> ???? ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, >> _regions); } >> ?? } cleanup(regions); >> ?? ... >> ?? cleanup._aborted = false; >> ?? return true; >> } >> > > ? I implemented that as it is least intrusive in the end. > > http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point > in providing diff) > > Thanks, > ? Thomas From kim.barrett at oracle.com Tue Feb 18 20:30:28 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 18 Feb 2020 15:30:28 -0500 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com> <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com> Message-ID: > On Feb 18, 2020, at 1:00 PM, Ioi Lam wrote: > > The changes look OK to me. > > I think this line doesn't need to be changed. > > 1820 *heap_mem = cleanup._regions; +1 I don?t need a new webrev for revert of line 1820. > > Thanks > - Ioi > > On 2/18/20 8:03 AM, Thomas Schatzl wrote: >> Hi, >> >> On 15.02.20 09:20, Kim Barrett wrote: >>>> On Feb 14, 2020, at 11:08 AM, Ioi Lam wrote: >>>> >>>> Hi Thomas, >>>> >>>> Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place. >>> >>> Rather than splitting up the function, one could add a local cleanup handler: >>> >>> ... create and initialize regions object ... >>> struct Cleanup { >>> MemRegion* _regions; >>> bool _aborted; >>> Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {} >>> ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); } >>> } cleanup(regions); >>> ... >>> cleanup._aborted = false; >>> return true; >>> } >>> >> >> I implemented that as it is least intrusive in the end. >> >> http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point in providing diff) >> >> Thanks, >> Thomas From thomas.schatzl at oracle.com Tue Feb 18 20:52:08 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 21:52:08 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: Message-ID: <8ea20904-ca6d-f0cb-9cf7-a02dc717ebc7@oracle.com> Hi Liang, dug through the changes a bit, took longer and only managed to do cursory testing as there were a few issues. That (very) cursory testing showed that memory consumption on one specjvm2008 out-of-box application is as baselined, but currently running the full set. The change I used is available at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/, I will step through what changed below. - not really a bug and pre-existing, but I changed the various resize_heap_* to always include the exact GC pause because particularly for the "after_concurrent_mark" suffix it is not clear what this means. I.e. in the Remark or Cleanup pauses, or at the real end of concurrent cycle (still concurrent)? This has not been done consistently yet. - I think there has been a copy&paste error in G1CollectedHeap::resize_heap_if_necessary, the two calculations to determine the min and max desired capacity were equal. I.e. 1178 size_t minimum_desired_capacity = _heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio); 1179 size_t maximum_desired_capacity = _heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio); Note the duplicate use of MinHeapFreeRatio. Fixed in above webrev. - CollectorState contains flags that basically indicate they type of GC, which should be set at the start of gc and updated at the end of gc. The new finish_of_mixed_gc does not fit here as it is basically a flag indicating that we need to do the resizing. The previous implementation also lets the first young-only gc after the last mixed gc do the resizing which is probably not as intended. By adding an additional policy()->next_gc_should_be_mixed() call instead of the state check (and removing this pause state/type completely) fixes this (I think ;)). - the suggested change removes the expansion during Cleanup for the reasons stated earlier. This removes the need for some code in the G1HeapSizingPolicy where originally _minimum_desired_bytes_after_last_cm had been stored. It's better to move this to G1Policy (and pre-existing, G1Policy should be the owner of G1HeapSizingPolicy which I did not fix in this change) - (the suggested change does not add the shrinking at remark discussed earlier; I still think it would be nice and maybe fix that failing regression test) - there should be more gc+heap+ergo logging of calculated targets/desired sizes in the new methods in G1HeapSizingPolicy, otherwise the decisions are very hard to follow after the fact. - I believe there is an underestimation of the desired bytes after concurrent mark with adaptive IHOP enabled in the current code. If you look at the method G1Policy::desired_bytes_after_concurrent_mark(), the two terms returned by that method do not seem equal. I.e. G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the used bytes, the reserve and other parts used for the static IHOP (i.e. minimum_desired_buffer_size == 0). At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the young gen part of the latter. Some better name for this should be found too =) As mentioned, currently running more tests until tomorrow (even with above known issues) to get some experience/data to look at with the sizing at mixed gc heuristic. Thanks, Thomas From thomas.schatzl at oracle.com Tue Feb 18 20:54:12 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 18 Feb 2020 21:54:12 +0100 Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive regions In-Reply-To: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com> References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com> <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com> <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com> <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com> Message-ID: <62acfed8-5975-078f-7f26-f669ceabec50@oracle.com> Hi Ioi, On 18.02.20 19:00, Ioi Lam wrote: > The changes look OK to me. > > I think this line doesn't need to be changed. > > 1820?? *heap_mem = cleanup._regions; fixed and regenerated latest webrev (.1) http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ Thanks, Thomas From serguei.spitsyn at oracle.com Tue Feb 18 20:59:10 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 18 Feb 2020 12:59:10 -0800 Subject: RFR: add parallel heap inspection support for jmap histo(G1)(Internet mail) In-Reply-To: References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com> Message-ID: Hi Lin, Could you, please, re-post your RFR with the right enhancement number in the message subject? It will be more trackable this way. Thanks, Serguei On 2/17/20 10:29 PM, linzang(??) wrote: > Dear David, > ? ? ? Thanks a lot! > ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. > ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. > ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap. > > Thanks, > -------------- > Lin >> Hi Lin, >> >> Adding in hotspot-gc-dev as they need to see how this interacts with GC >> worker threads, and whether it needs to be extended beyond G1. >> >> I happened to spot one nit when browsing: >> >> src/hotspot/share/gc/shared/collectedHeap.hpp >> >> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, >> +????????????????????????????????????????? BoolObjectClosure* filter, >> +????????????????????????????????????????? size_t* missed_count, >> +????????????????????????????????????????? size_t thread_num) { >> +???? return NULL; >> >> s/NULL/false/ >> >> Cheers, >> David >> >> On 18/02/2020 2:15 pm, linzang(??) wrote: >>> Dear All, >>> ? ? ? ?May I ask your help to review the follow changes: >>> ? ? ? ?webrev: >>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ >>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >>> ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. >>> ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with >>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. >>> >>> ------------------------------------------------------------------------ >>> BRs, >>> Lin > > From linzang at tencent.com Wed Feb 19 01:34:40 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Wed, 19 Feb 2020 01:34:40 +0000 Subject: RFR: JDK-8215264 add parallel heap inspection support for jmap histo(G1)(Internet mail) References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>, , , Message-ID: <7e215dc97a584554b3e854d8801dc256@tencent.com> Re-post this RFR with enhancement number to make it trackable. webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/ bug: https://bugs.openjdk.java.net/browse/JDK-8215624 CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 ? Thanks! -------------- Lin >Hi Lin, > >Could you, please, re-post your RFR with the right enhancement number in >the message subject? >It will be more trackable this way. > >Thanks, >Serguei > > >On 2/17/20 10:29 PM, linzang(??) wrote: >> Dear David, >>? ? ? ? Thanks a lot! >> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. >>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. >>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap. >>???? >> Thanks, >> -------------- >> Lin >>> Hi Lin, >>> >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC >>> worker threads, and whether it needs to be extended beyond G1. >>> >>> I happened to spot one nit when browsing: >>> >>> src/hotspot/share/gc/shared/collectedHeap.hpp >>> >>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, >>> +????????????????????????????????????????? BoolObjectClosure* filter, >>> +????????????????????????????????????????? size_t* missed_count, >>> +????????????????????????????????????????? size_t thread_num) { >>> +???? return NULL; >>> >>> s/NULL/false/ >>> >>> Cheers, >>> David >>> >>> On 18/02/2020 2:15 pm, linzang(??) wrote: >>>> Dear All, >>>>? ? ? ? ?May I ask your help to review the follow changes: >>>>? ? ? ? ?webrev: >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ >>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. >>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. >>>> >>>> ------------------------------------------------------------------------ >>>> BRs, >>>> Lin >> > > From linzang at tencent.com Wed Feb 19 01:38:31 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Wed, 19 Feb 2020 01:38:31 +0000 Subject: RFR: JDK-8215264 add parallel heap inspection support for jmap histo(G1)(Internet mail) References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>, , , , <7e215dc97a584554b3e854d8801dc256@tencent.com> Message-ID: So sorry the number in this title is wrong. please ignore it ! so sorry about making this mistake.? will re post with correct number.? -------------- Lin >Re-post this RFR with enhancement number to make it trackable. >webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/ >bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >? >Thanks! >-------------- >Lin >>Hi Lin, >> >>Could you, please, re-post your RFR with the right enhancement number in >>the message subject? >>It will be more trackable this way. >> >>Thanks, >>Serguei >> >> >>On 2/17/20 10:29 PM, linzang(??) wrote: >>> Dear David, >>>? ? ? ? Thanks a lot! >>> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. >>>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. >>>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap. >>>???? >>> Thanks, >>> -------------- >>> Lin >>>> Hi Lin, >>>> >>>> Adding in hotspot-gc-dev as they need to see how this interacts with GC >>>> worker threads, and whether it needs to be extended beyond G1. >>>> >>>> I happened to spot one nit when browsing: >>>> >>>> src/hotspot/share/gc/shared/collectedHeap.hpp >>>> >>>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, >>>> +????????????????????????????????????????? BoolObjectClosure* filter, >>>> +????????????????????????????????????????? size_t* missed_count, >>>> +????????????????????????????????????????? size_t thread_num) { >>>> +???? return NULL; >>>> >>>> s/NULL/false/ >>>> >>>> Cheers, >>>> David >>>> >>>> On 18/02/2020 2:15 pm, linzang(??) wrote: >>>>> Dear All, >>>>>? ? ? ? ?May I ask your help to review the follow changes: >>>>>? ? ? ? ?webrev: >>>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ >>>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >>>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >>>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. >>>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with >>>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. >>>>> >>>>> ------------------------------------------------------------------------ >>>>> BRs, >>>>> Lin >>> > >> From linzang at tencent.com Wed Feb 19 01:40:34 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Wed, 19 Feb 2020 01:40:34 +0000 Subject: RFR: JDK-8215624 add parallel heap inspection support for jmap histo(G1)(Internet mail) References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>, , , Message-ID: Re-post this RFR with correct enhancement number to make it trackable. please ignore the previous wrong post. sorry for troubles.? webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/ bug: https://bugs.openjdk.java.net/browse/JDK-8215624 CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 -------------- Lin >Hi Lin, > >Could you, please, re-post your RFR with the right enhancement number in >the message subject? >It will be more trackable this way. > >Thanks, >Serguei > > >On 2/17/20 10:29 PM, linzang(??) wrote: >> Dear David, >>? ? ? ? Thanks a lot! >> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. >>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. >>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap. >>???? >> Thanks, >> -------------- >> Lin >>> Hi Lin, >>> >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC >>> worker threads, and whether it needs to be extended beyond G1. >>> >>> I happened to spot one nit when browsing: >>> >>> src/hotspot/share/gc/shared/collectedHeap.hpp >>> >>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, >>> +????????????????????????????????????????? BoolObjectClosure* filter, >>> +????????????????????????????????????????? size_t* missed_count, >>> +????????????????????????????????????????? size_t thread_num) { >>> +???? return NULL; >>> >>> s/NULL/false/ >>> >>> Cheers, >>> David >>> >>> On 18/02/2020 2:15 pm, linzang(??) wrote: >>>> Dear All, >>>>? ? ? ? ?May I ask your help to review the follow changes: >>>>? ? ? ? ?webrev: >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ >>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624 >>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 >>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo. >>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. >>>> >>>> ------------------------------------------------------------------------ >>>> BRs, >>>> Lin >> > > From per.liden at oracle.com Wed Feb 19 08:07:43 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 19 Feb 2020 09:07:43 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> Message-ID: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote: [...] >>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>>> >>>> Before this patch can go forward, you need to get to the bottom of >>>> how to get that ioctl command to work. If it's not possible, you >>>> need to explain why and propose alternatives that we can discuss. >>> >>> I guess it is caused by Linux kernel. >>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags >>> to `struct FS_IOC_FSGETXATTR`. >>> However `FS_XFLAG_DAX` is not handled in it. >> >> Did a bit of googleing and it seems the DAX flag is in a bit of flux >> at the moment. I guess this will be fixed down the road, when DAX in >> the kernel becomes a non-experimental feature. >> >> How about we just do like this for now: >> >> http://cr.openjdk.java.net/~pliden/8239129/webrev.0 > > I thought ZGC requires tmpfs or hugetlbfs due to performance reason. > So I introduced new -XX option to make users aware of it. The filesystem type check is there to help users avoid the mistake of placing the heap on an unintended/slow filesystem. However, most users will never use -XX:AllocateHeapAt, so I think that risk is fairly small to begin with. The bar for adding new options to ZGC is high, and I don't think it's high enough in this case. Also, other GCs happily allow you to place the heap on any filesystem and I don't mind having that flexibility in ZGC too. > > If not so, I agree with your change. > Ok, thanks. I updated the patch, added and adjusted some logging, and added a test. I also updated the bug title/description. http://cr.openjdk.java.net/~pliden/8239129/webrev.1 cheers, Per From maoliang.ml at alibaba-inc.com Wed Feb 19 08:09:46 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 19 Feb 2020 16:09:46 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com> Hi Thomas and Stefan? Regarding the failed test case of JEP 346 and the potential idle scenario we discussed, I don't oppose to reserve the shring in remark because introducing another perodic GC to make sure the mixed GCs may not be a good idea as well. Thank Thomas for fixing my mistakes. By looking into your patch, I didn't see the expansion after concurrent mark based on policy()->desired_bytes_after_concurrent_mark(). Is it missed? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 19 (Wed.) 04:52 To:"MAO, Liang" ; Stefan Johansson ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi Liang, dug through the changes a bit, took longer and only managed to do cursory testing as there were a few issues. That (very) cursory testing showed that memory consumption on one specjvm2008 out-of-box application is as baselined, but currently running the full set. The change I used is available at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/, I will step through what changed below. - not really a bug and pre-existing, but I changed the various resize_heap_* to always include the exact GC pause because particularly for the "after_concurrent_mark" suffix it is not clear what this means. I.e. in the Remark or Cleanup pauses, or at the real end of concurrent cycle (still concurrent)? This has not been done consistently yet. - I think there has been a copy&paste error in G1CollectedHeap::resize_heap_if_necessary, the two calculations to determine the min and max desired capacity were equal. I.e. 1178 size_t minimum_desired_capacity = _heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio); 1179 size_t maximum_desired_capacity = _heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio); Note the duplicate use of MinHeapFreeRatio. Fixed in above webrev. - CollectorState contains flags that basically indicate they type of GC, which should be set at the start of gc and updated at the end of gc. The new finish_of_mixed_gc does not fit here as it is basically a flag indicating that we need to do the resizing. The previous implementation also lets the first young-only gc after the last mixed gc do the resizing which is probably not as intended. By adding an additional policy()->next_gc_should_be_mixed() call instead of the state check (and removing this pause state/type completely) fixes this (I think ;)). - the suggested change removes the expansion during Cleanup for the reasons stated earlier. This removes the need for some code in the G1HeapSizingPolicy where originally _minimum_desired_bytes_after_last_cm had been stored. It's better to move this to G1Policy (and pre-existing, G1Policy should be the owner of G1HeapSizingPolicy which I did not fix in this change) - (the suggested change does not add the shrinking at remark discussed earlier; I still think it would be nice and maybe fix that failing regression test) - there should be more gc+heap+ergo logging of calculated targets/desired sizes in the new methods in G1HeapSizingPolicy, otherwise the decisions are very hard to follow after the fact. - I believe there is an underestimation of the desired bytes after concurrent mark with adaptive IHOP enabled in the current code. If you look at the method G1Policy::desired_bytes_after_concurrent_mark(), the two terms returned by that method do not seem equal. I.e. G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the used bytes, the reserve and other parts used for the static IHOP (i.e. minimum_desired_buffer_size == 0). At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the young gen part of the latter. Some better name for this should be found too =) As mentioned, currently running more tests until tomorrow (even with above known issues) to get some experience/data to look at with the sizing at mixed gc heuristic. Thanks, Thomas From ivan.walulya at oracle.com Wed Feb 19 08:35:51 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Wed, 19 Feb 2020 09:35:51 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc Message-ID: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> Hi all, Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ Testing: Tier 1 - 3 //Ivan From suenaga at oss.nttdata.com Wed Feb 19 08:43:53 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 19 Feb 2020 17:43:53 +0900 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> Message-ID: <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com> Hi Per, Thanks for updating JBS and for creating patch! Your change looks good to me. Please list me as Reviewer. Thanks, Yasumasa On 2020/02/19 17:07, Per Liden wrote: > On 2/17/20 1:28 PM, Yasumasa Suenaga wrote: > [...] >>>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>>>> >>>>> Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss. >>>> >>>> I guess it is caused by Linux kernel. >>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`. >>>> However `FS_XFLAG_DAX` is not handled in it. >>> >>> Did a bit of googleing and it seems the DAX flag is in a bit of flux at the moment. I guess this will be fixed down the road, when DAX in the kernel becomes a non-experimental feature. >>> >>> How about we just do like this for now: >>> >>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0 >> >> I thought ZGC requires tmpfs or hugetlbfs due to performance reason. >> So I introduced new -XX option to make users aware of it. > > The filesystem type check is there to help users avoid the mistake of placing the heap on an unintended/slow filesystem. However, most users will never use -XX:AllocateHeapAt, so I think that risk is fairly small to begin with. > > The bar for adding new options to ZGC is high, and I don't think it's high enough in this case. Also, other GCs happily allow you to place the heap on any filesystem and I don't mind having that flexibility in ZGC too. > >> >> If not so, I agree with your change. >> > > Ok, thanks. > > I updated the patch, added and adjusted some logging, and added a test. I also updated the bug title/description. > > http://cr.openjdk.java.net/~pliden/8239129/webrev.1 > > cheers, > Per From thomas.schatzl at oracle.com Wed Feb 19 08:45:12 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 19 Feb 2020 09:45:12 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com> References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com> Message-ID: <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com> On 19.02.20 09:09, Liang Mao wrote: > Hi Thomas and Stefan? > > Regarding the failed test case of JEP 346 and the potential idle > scenario we discussed, I don't oppose to reserve the shring in > remark because introducing another perodic GC to make sure the > mixed GCs may not be a good idea as well. > > Thank Thomas for fixing my mistakes. By looking into your patch, > I didn't see the expansion after concurrent mark based on > policy()->desired_bytes_after_concurrent_mark(). Is it missed? > in an earlier email Stefan asked why the heuristic expands during Cleanup. In our opinion this is unnecessary and an artifact of doing full gc sizing in the Remark pause. The reasoning goes as follows: at worst normal expansion between Cleanup and the first mixed gc will expand the heap anyway. There does not seem to be much difference in doing expansion during Cleanup or GC (or inbetween) except that it would arbitrarily move the cost into the Cleanup pause. (And in the stable state this shouldn't happen because we previously already sized the heap optimally ;) ) So the recent suggestion removes it. As mentioned, this is untested (and I am going to look at overnight results later today) but seems okay as the last mixed gc will size "optimally" later anyway. Cleanup pause still records the _minimum_desired_bytes_after_last_gc since it is still needed later (and when discussing this last time we thought that this is the "best" place, now if we do not expand during Cleanup we actually do not need to do that there any more). One more comment about one of the raised issues with the code further below. > > ------------------------------------------------------------------ > From:Thomas Schatzl > Send Time:2020 Feb. 19 (Wed.) 04:52 > To:"MAO, Liang" ; Stefan Johansson > ; hotspot-gc-dev > > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > [...] > > -?I?believe?there?is?an?underestimation?of?the?desired?bytes?after > concurrent?mark?with?adaptive?IHOP?enabled?in?the?current?code.?If?you > look?at?the?method?G1Policy::desired_bytes_after_concurrent_mark(),?the > two?terms?returned?by?that?method?do?not?seem?equal.?I.e. > G1AdaptiveIHOP::predict_unrestrained_buffer_size()?does?not?contain?the > used?bytes,?the?reserve?and?other?parts?used?for?the?static?IHOP?(i.e. > minimum_desired_buffer_size?==?0). > > At?most,?G1AdaptiveIHOP::predict_unrestrained_buffer_size()?covers?the > young?gen?part?of?the?latter. I.e. size_t G1Policy::minimum_desired_bytes_after_concurrent_mark(size_t used_bytes) { size_t minimum_desired_buffer_size = _ihop_control->predict_unstrained_buffer_size(); return minimum_desired_buffer_size != 0 ? minimum_desired_buffer_size : _young_list_max_length * HeapRegion::GrainBytes + _reserve_regions * HeapRegion::GrainBytes + used_bytes; is from what I understand the same as: if (minimum_desired_buffer_size != 0) { return minimum_desired_buffer_size; } else { return _young_list_max_length * ... + reserve_regions...; } I *think* the following has been intended: return (minimum_desired_buffer_size != 0 ? minimum_desired_buffer_size : _young_list_max_length * HeapRegion::GrainBytes) + _reserve_regions * HeapRegion::GrainBytes + used_bytes; It would be nicer to restructure the code a bit though. > As?mentioned,?currently?running?more?tests?until?tomorrow?(even?with > above?known?issues)?to?get?some?experience/data?to?look?at?with?the > sizing?at?mixed?gc?heuristic. > Thanks, ? Thomas From per.liden at oracle.com Wed Feb 19 08:48:17 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 19 Feb 2020 09:48:17 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com> Message-ID: <7c362a1e-c4c3-90b7-ff67-15a522072b4b@oracle.com> Hi Yasumasa, On 2/19/20 9:43 AM, Yasumasa Suenaga wrote: > Hi Per, > > Thanks for updating JBS and for creating patch! > Your change looks good to me. Great, thanks. > Please list me as Reviewer. I'll add you both as reviewer and contributor of the patch. cheers, Per > > > Thanks, > > Yasumasa > > > On 2020/02/19 17:07, Per Liden wrote: >> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote: >> [...] >>>>>>> ?? webrev: >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>>>>> >>>>>> Before this patch can go forward, you need to get to the bottom of >>>>>> how to get that ioctl command to work. If it's not possible, you >>>>>> need to explain why and propose alternatives that we can discuss. >>>>> >>>>> I guess it is caused by Linux kernel. >>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem >>>>> flags to `struct FS_IOC_FSGETXATTR`. >>>>> However `FS_XFLAG_DAX` is not handled in it. >>>> >>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux >>>> at the moment. I guess this will be fixed down the road, when DAX in >>>> the kernel becomes a non-experimental feature. >>>> >>>> How about we just do like this for now: >>>> >>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0 >>> >>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason. >>> So I introduced new -XX option to make users aware of it. >> >> The filesystem type check is there to help users avoid the mistake of >> placing the heap on an unintended/slow filesystem. However, most users >> will never use -XX:AllocateHeapAt, so I think that risk is fairly >> small to begin with. >> >> The bar for adding new options to ZGC is high, and I don't think it's >> high enough in this case. Also, other GCs happily allow you to place >> the heap on any filesystem and I don't mind having that flexibility in >> ZGC too. >> >>> >>> If not so, I agree with your change. >>> >> >> Ok, thanks. >> >> I updated the patch, added and adjusted some logging, and added a >> test. I also updated the bug title/description. >> >> http://cr.openjdk.java.net/~pliden/8239129/webrev.1 >> >> cheers, >> Per From thomas.schatzl at oracle.com Wed Feb 19 09:22:25 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 19 Feb 2020 10:22:25 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> Message-ID: Hi, On 19.02.20 09:35, Ivan Walulya wrote: > Hi all, > > Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 > Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ > Testing: Tier 1 - 3 > > > //Ivan > lgtm :) Thomas From ivan.walulya at oracle.com Wed Feb 19 09:27:34 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Wed, 19 Feb 2020 10:27:34 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> Message-ID: Thanks Thomas! > On 19 Feb 2020, at 10:22, Thomas Schatzl wrote: > > Hi, > > On 19.02.20 09:35, Ivan Walulya wrote: >> Hi all, >> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >> Testing: Tier 1 - 3 >> //Ivan > > lgtm :) > > Thomas From maoliang.ml at alibaba-inc.com Wed Feb 19 10:44:19 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Wed, 19 Feb 2020 18:44:19 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?= =?UTF-8?B?aGV1cmlzdGljcw==?= In-Reply-To: <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com> References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>, <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com> Message-ID: <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com> Hi Thomas, When I was testing those benchmarks like specjbb2015 and specjvm2008, the expansions mostly happened at remark. So I guess the expansion after concurrent mark at peak usage based on a minimal capacity might prevent several expansions in normal young collections. It's only my thinking since I don't have much performance data. I don't have any problems with expanding after young collection:) BTW, do you and Stefan prefer to leave the shrink at remark for fixing the failure of JEP346 and handling the idle scenario? Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 19 (Wed.) 16:45 To:"MAO, Liang" ; Stefan Johansson ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics On 19.02.20 09:09, Liang Mao wrote: > Hi Thomas and Stefan? > > Regarding the failed test case of JEP 346 and the potential idle > scenario we discussed, I don't oppose to reserve the shring in > remark because introducing another perodic GC to make sure the > mixed GCs may not be a good idea as well. > > Thank Thomas for fixing my mistakes. By looking into your patch, > I didn't see the expansion after concurrent mark based on > policy()->desired_bytes_after_concurrent_mark(). Is it missed? > in an earlier email Stefan asked why the heuristic expands during Cleanup. In our opinion this is unnecessary and an artifact of doing full gc sizing in the Remark pause. The reasoning goes as follows: at worst normal expansion between Cleanup and the first mixed gc will expand the heap anyway. There does not seem to be much difference in doing expansion during Cleanup or GC (or inbetween) except that it would arbitrarily move the cost into the Cleanup pause. (And in the stable state this shouldn't happen because we previously already sized the heap optimally ;) ) So the recent suggestion removes it. As mentioned, this is untested (and I am going to look at overnight results later today) but seems okay as the last mixed gc will size "optimally" later anyway. Cleanup pause still records the _minimum_desired_bytes_after_last_gc since it is still needed later (and when discussing this last time we thought that this is the "best" place, now if we do not expand during Cleanup we actually do not need to do that there any more). One more comment about one of the raised issues with the code further below. > > ------------------------------------------------------------------ > From:Thomas Schatzl > Send Time:2020 Feb. 19 (Wed.) 04:52 > To:"MAO, Liang" ; Stefan Johansson > ; hotspot-gc-dev > > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics > [...] > > - I believe there is an underestimation of the desired bytes after > concurrent mark with adaptive IHOP enabled in the current code. If you > look at the method G1Policy::desired_bytes_after_concurrent_mark(), the > two terms returned by that method do not seem equal. I.e. > G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the > used bytes, the reserve and other parts used for the static IHOP (i.e. > minimum_desired_buffer_size == 0). > > At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the > young gen part of the latter. I.e. size_t G1Policy::minimum_desired_bytes_after_concurrent_mark(size_t used_bytes) { size_t minimum_desired_buffer_size = _ihop_control->predict_unstrained_buffer_size(); return minimum_desired_buffer_size != 0 ? minimum_desired_buffer_size : _young_list_max_length * HeapRegion::GrainBytes + _reserve_regions * HeapRegion::GrainBytes + used_bytes; is from what I understand the same as: if (minimum_desired_buffer_size != 0) { return minimum_desired_buffer_size; } else { return _young_list_max_length * ... + reserve_regions...; } I *think* the following has been intended: return (minimum_desired_buffer_size != 0 ? minimum_desired_buffer_size : _young_list_max_length * HeapRegion::GrainBytes) + _reserve_regions * HeapRegion::GrainBytes + used_bytes; It would be nicer to restructure the code a bit though. > As mentioned, currently running more tests until tomorrow (even with > above known issues) to get some experience/data to look at with the > sizing at mixed gc heuristic. > Thanks, Thomas From thomas.schatzl at oracle.com Wed Feb 19 10:54:29 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 19 Feb 2020 11:54:29 +0100 Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com> References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com> <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com> <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com> Message-ID: <6c8281c1-d2b3-073b-8984-8f030f105b14@oracle.com> Hi, On 19.02.20 11:44, Liang Mao wrote: > Hi Thomas, > > When I was testing those benchmarks like specjbb2015 and specjvm2008, > the expansions mostly happened at remark. So I guess the expansion after > concurrent mark at peak usage based on a minimal capacity might > prevent several expansions in normal young collections. It's only my > thinking since I don't have much performance data. I don't have any > problems with expanding after young collection:) We'll collect perf data about this. > > BTW, do you and Stefan prefer to leave the shrink at remark for fixing > the failure of JEP346 and handling the idle scenario? Yes, and since Stefan suggested that we should shrink during Remark already I think he agrees. Thomas From kim.barrett at oracle.com Wed Feb 19 15:23:09 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 19 Feb 2020 10:23:09 -0500 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> Message-ID: <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> > On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: > > Hi all, > > Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 > Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ > Testing: Tier 1 - 3 > > > //Ivan Setting UseNUMA true when Linux::libnuma_init returns false seems unlikely to work. The description of ForceNUMA is Force NUMA optimizations on single-node/UMA systems which suggests how it's presently being used in numa_init is wrong. I think the current use should be removed and this conditional clause 5129 // If there's only one node (they start from 0) or if the process 5130 // is bound explicitly to a single node using membind, disable NUMA. 5131 UseNUMA = false; should instead use UseNUMA = ForceNUMA From kim.barrett at oracle.com Wed Feb 19 15:30:52 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 19 Feb 2020 10:30:52 -0500 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: > On Feb 19, 2020, at 10:23 AM, Kim Barrett wrote: > >> On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >> Testing: Tier 1 - 3 >> >> >> //Ivan > > Setting UseNUMA true when Linux::libnuma_init returns false seems > unlikely to work. The description of ForceNUMA is > > Force NUMA optimizations on single-node/UMA systems > > which suggests how it's presently being used in numa_init is wrong. I > think the current use should be removed and this conditional clause > > 5129 // If there's only one node (they start from 0) or if the process > 5130 // is bound explicitly to a single node using membind, disable NUMA. > 5131 UseNUMA = false; > > should instead use > > UseNUMA = ForceNUMA The Solaris use of ForceNUMA looks like it has a similar problem. On Windows, UseNUMA seems to get forced off unless ForceNUMA, because NUMA support isn?t complete there. Which is an entirely different meaning for ForceNUMA from its description. That covers all the uses of ForceNUMA. From ivan.walulya at oracle.com Wed Feb 19 15:44:31 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Wed, 19 Feb 2020 16:44:31 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: Thanks Kim, I agree it is might be redundant to ForceNUMA when Linux::libnuma_init fails. I will make the changes and make a new RFR. > On 19 Feb 2020, at 16:30, Kim Barrett wrote: > >> On Feb 19, 2020, at 10:23 AM, Kim Barrett wrote: >> >>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: >>> >>> Hi all, >>> >>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >>> Testing: Tier 1 - 3 >>> >>> >>> //Ivan >> >> Setting UseNUMA true when Linux::libnuma_init returns false seems >> unlikely to work. The description of ForceNUMA is >> >> Force NUMA optimizations on single-node/UMA systems >> >> which suggests how it's presently being used in numa_init is wrong. I >> think the current use should be removed and this conditional clause >> >> 5129 // If there's only one node (they start from 0) or if the process >> 5130 // is bound explicitly to a single node using membind, disable NUMA. >> 5131 UseNUMA = false; >> >> should instead use >> >> UseNUMA = ForceNUMA > > The Solaris use of ForceNUMA looks like it has a similar problem. > > On Windows, UseNUMA seems to get forced off unless ForceNUMA, because > NUMA support isn?t complete there. Which is an entirely different meaning for > ForceNUMA from its description. > > That covers all the uses of ForceNUMA. > From ivan.walulya at oracle.com Thu Feb 20 08:04:45 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 20 Feb 2020 09:04:45 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: Hi all, Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ //Ivan > On 19 Feb 2020, at 16:30, Kim Barrett wrote: > >> On Feb 19, 2020, at 10:23 AM, Kim Barrett wrote: >> >>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: >>> >>> Hi all, >>> >>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >>> Testing: Tier 1 - 3 >>> >>> >>> //Ivan >> >> Setting UseNUMA true when Linux::libnuma_init returns false seems >> unlikely to work. The description of ForceNUMA is >> >> Force NUMA optimizations on single-node/UMA systems >> >> which suggests how it's presently being used in numa_init is wrong. I >> think the current use should be removed and this conditional clause >> >> 5129 // If there's only one node (they start from 0) or if the process >> 5130 // is bound explicitly to a single node using membind, disable NUMA. >> 5131 UseNUMA = false; >> >> should instead use >> >> UseNUMA = ForceNUMA > > The Solaris use of ForceNUMA looks like it has a similar problem. > > On Windows, UseNUMA seems to get forced off unless ForceNUMA, because > NUMA support isn?t complete there. Which is an entirely different meaning for > ForceNUMA from its description. > > That covers all the uses of ForceNUMA. > From per.liden at oracle.com Thu Feb 20 08:26:07 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 20 Feb 2020 09:26:07 +0100 Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic Message-ID: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> I propose that the ZProactive flag shouldn't be a diagnostic flag, since it's a feature you might want to permanently enable/disable (similar to ZUncommit), rather than something you enable/disable to diagnose an issue. Bug: https://bugs.openjdk.java.net/browse/JDK-8239533 Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0 /Per From erik.osterlund at oracle.com Thu Feb 20 10:03:39 2020 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 20 Feb 2020 11:03:39 +0100 Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic In-Reply-To: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> Message-ID: <50696985-7657-021e-bdcd-a463f5b79456@oracle.com> Hi Per, Looks good. Thanks, /Erik On 2/20/20 9:26 AM, Per Liden wrote: > I propose that the ZProactive flag shouldn't be a diagnostic flag, > since it's a feature you might want to permanently enable/disable > (similar to ZUncommit), rather than something you enable/disable to > diagnose an issue. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8239533 > Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0 > > /Per From per.liden at oracle.com Thu Feb 20 10:52:17 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 20 Feb 2020 11:52:17 +0100 Subject: RFC: JEP: ZGC: Production Ready Message-ID: Hi all, I've created a JEP draft to make ZGC a product (non-experimental) feature. https://bugs.openjdk.java.net/browse/JDK-8209683 Comments and feedback welcome. cheers, Per From per.liden at oracle.com Thu Feb 20 10:52:37 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 20 Feb 2020 11:52:37 +0100 Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic In-Reply-To: <50696985-7657-021e-bdcd-a463f5b79456@oracle.com> References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> <50696985-7657-021e-bdcd-a463f5b79456@oracle.com> Message-ID: Thanks Erik! /Per On 2/20/20 11:03 AM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 2/20/20 9:26 AM, Per Liden wrote: >> I propose that the ZProactive flag shouldn't be a diagnostic flag, >> since it's a feature you might want to permanently enable/disable >> (similar to ZUncommit), rather than something you enable/disable to >> diagnose an issue. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533 >> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0 >> >> /Per > From shade at redhat.com Thu Feb 20 12:24:15 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 20 Feb 2020 13:24:15 +0100 Subject: RFR (S) 8232100: GC timings should use proper units for heap sizes In-Reply-To: References: Message-ID: <4ad7db72-1a52-037f-37b1-558ec176a172@redhat.com> On 10/10/19 2:03 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232100 > > Webrev: > https://cr.openjdk.java.net/~shade/8232100/webrev.01/ > > GC log prints heap sizes in selected GC events. Currently, it unconditionally uses "M" as the suffix > for heap sizes, which makes GC logs too coarse on smaller heaps. This loses performance data > accuracy, which is sometimes a dealbreaker in logs analysis. Let's make it into proper units. > > I ran many tests of my own, but would appreciate if somebody runs it through more comprehensive > suite of tests, looking for tests that parse the GC logs for whatever reason. > > Testing: eyeballing GC logs, jdk-submit, hotspot_gc {g1, shenandoah, parallel} No takers? :) -- Thanks, -Aleksey From kim.barrett at oracle.com Thu Feb 20 21:08:52 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 20 Feb 2020 16:08:52 -0500 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: > On Feb 20, 2020, at 3:04 AM, Ivan Walulya wrote: > > Hi all, > > Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ Looks good. From ivan.walulya at oracle.com Fri Feb 21 09:08:26 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 21 Feb 2020 10:08:26 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: <974919FC-07EA-43DD-B71C-83DB1834BFF1@oracle.com> Thanks kim! > On 20 Feb 2020, at 22:08, Kim Barrett wrote: > >> On Feb 20, 2020, at 3:04 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ > > Looks good. > From stefan.karlsson at oracle.com Fri Feb 21 09:21:47 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 21 Feb 2020 10:21:47 +0100 Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic In-Reply-To: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> Message-ID: Looks good. StefanK On 2020-02-20 09:26, Per Liden wrote: > I propose that the ZProactive flag shouldn't be a diagnostic flag, > since it's a feature you might want to permanently enable/disable > (similar to ZUncommit), rather than something you enable/disable to > diagnose an issue. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8239533 > Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0 > > /Per From thomas.schatzl at oracle.com Fri Feb 21 09:30:03 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 21 Feb 2020 10:30:03 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: Hi, On 20.02.20 09:04, Ivan Walulya wrote: > Hi all, > > Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ I think this is even better :) Thomas From leo.korinth at oracle.com Fri Feb 21 09:45:27 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Fri, 21 Feb 2020 10:45:27 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: Looks good, I will push for you. Thanks, Leo On 20/02/2020 09:04, Ivan Walulya wrote: > Hi all, > > Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ > > //Ivan > >> On 19 Feb 2020, at 16:30, Kim Barrett wrote: >> >>> On Feb 19, 2020, at 10:23 AM, Kim Barrett wrote: >>> >>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: >>>> >>>> Hi all, >>>> >>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >>>> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >>>> Testing: Tier 1 - 3 >>>> >>>> >>>> //Ivan >>> >>> Setting UseNUMA true when Linux::libnuma_init returns false seems >>> unlikely to work. The description of ForceNUMA is >>> >>> Force NUMA optimizations on single-node/UMA systems >>> >>> which suggests how it's presently being used in numa_init is wrong. I >>> think the current use should be removed and this conditional clause >>> >>> 5129 // If there's only one node (they start from 0) or if the process >>> 5130 // is bound explicitly to a single node using membind, disable NUMA. >>> 5131 UseNUMA = false; >>> >>> should instead use >>> >>> UseNUMA = ForceNUMA >> >> The Solaris use of ForceNUMA looks like it has a similar problem. >> >> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because >> NUMA support isn?t complete there. Which is an entirely different meaning for >> ForceNUMA from its description. >> >> That covers all the uses of ForceNUMA. >> > From ivan.walulya at oracle.com Fri Feb 21 10:02:41 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 21 Feb 2020 11:02:41 +0100 Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with parallel gc In-Reply-To: References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com> <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com> Message-ID: Thanks Leo! > On 21 Feb 2020, at 10:45, Leo Korinth wrote: > > Looks good, I will push for you. > > Thanks, > Leo > > On 20/02/2020 09:04, Ivan Walulya wrote: >> Hi all, >> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ >> //Ivan >>> On 19 Feb 2020, at 16:30, Kim Barrett wrote: >>> >>>> On Feb 19, 2020, at 10:23 AM, Kim Barrett wrote: >>>> >>>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975 >>>>> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ >>>>> Testing: Tier 1 - 3 >>>>> >>>>> >>>>> //Ivan >>>> >>>> Setting UseNUMA true when Linux::libnuma_init returns false seems >>>> unlikely to work. The description of ForceNUMA is >>>> >>>> Force NUMA optimizations on single-node/UMA systems >>>> >>>> which suggests how it's presently being used in numa_init is wrong. I >>>> think the current use should be removed and this conditional clause >>>> >>>> 5129 // If there's only one node (they start from 0) or if the process >>>> 5130 // is bound explicitly to a single node using membind, disable NUMA. >>>> 5131 UseNUMA = false; >>>> >>>> should instead use >>>> >>>> UseNUMA = ForceNUMA >>> >>> The Solaris use of ForceNUMA looks like it has a similar problem. >>> >>> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because >>> NUMA support isn?t complete there. Which is an entirely different meaning for >>> ForceNUMA from its description. >>> >>> That covers all the uses of ForceNUMA. >>> From per.liden at oracle.com Fri Feb 21 10:10:18 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 21 Feb 2020 11:10:18 +0100 Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic In-Reply-To: References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com> Message-ID: Thanks Stefan! /Per On 2/21/20 10:21 AM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-02-20 09:26, Per Liden wrote: >> I propose that the ZProactive flag shouldn't be a diagnostic flag, >> since it's a feature you might want to permanently enable/disable >> (similar to ZUncommit), rather than something you enable/disable to >> diagnose an issue. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533 >> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0 >> >> /Per > From stefan.karlsson at oracle.com Fri Feb 21 11:30:18 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 21 Feb 2020 12:30:18 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> Message-ID: <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com> Looks good. StefanK On 2020-02-19 09:07, Per Liden wrote: > On 2/17/20 1:28 PM, Yasumasa Suenaga wrote: > [...] >>>>>> ?? webrev: >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>>>> >>>>> Before this patch can go forward, you need to get to the bottom of >>>>> how to get that ioctl command to work. If it's not possible, you >>>>> need to explain why and propose alternatives that we can discuss. >>>> >>>> I guess it is caused by Linux kernel. >>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem >>>> flags to `struct FS_IOC_FSGETXATTR`. >>>> However `FS_XFLAG_DAX` is not handled in it. >>> >>> Did a bit of googleing and it seems the DAX flag is in a bit of flux >>> at the moment. I guess this will be fixed down the road, when DAX in >>> the kernel becomes a non-experimental feature. >>> >>> How about we just do like this for now: >>> >>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0 >> >> I thought ZGC requires tmpfs or hugetlbfs due to performance reason. >> So I introduced new -XX option to make users aware of it. > > The filesystem type check is there to help users avoid the mistake of > placing the heap on an unintended/slow filesystem. However, most users > will never use -XX:AllocateHeapAt, so I think that risk is fairly > small to begin with. > > The bar for adding new options to ZGC is high, and I don't think it's > high enough in this case. Also, other GCs happily allow you to place > the heap on any filesystem and I don't mind having that flexibility in > ZGC too. > >> >> If not so, I agree with your change. >> > > Ok, thanks. > > I updated the patch, added and adjusted some logging, and added a > test. I also updated the bug title/description. > > http://cr.openjdk.java.net/~pliden/8239129/webrev.1 > > cheers, > Per From leonid.mesnik at oracle.com Fri Feb 21 19:48:06 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Fri, 21 Feb 2020 11:48:06 -0800 Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test Message-ID: Hi Could you please review following fix which removes parOld test. Test checks that ParOldGC is used if no GC is selected and new gen GC is PSYoungGen. Test is obsolete now and should be removed. webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8203239 From per.liden at oracle.com Mon Feb 24 10:53:03 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 24 Feb 2020 11:53:03 +0100 Subject: RFR: 8239129: Use DAX in ZGC In-Reply-To: <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com> References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com> <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com> <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com> <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com> <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com> <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com> <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com> <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com> Message-ID: <2d1db6fb-cce1-65dc-0539-819419314fcc@oracle.com> Thanks Stefan! /Per On 2/21/20 12:30 PM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-02-19 09:07, Per Liden wrote: >> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote: >> [...] >>>>>>> ?? webrev: >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/ >>>>>> >>>>>> Before this patch can go forward, you need to get to the bottom of >>>>>> how to get that ioctl command to work. If it's not possible, you >>>>>> need to explain why and propose alternatives that we can discuss. >>>>> >>>>> I guess it is caused by Linux kernel. >>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem >>>>> flags to `struct FS_IOC_FSGETXATTR`. >>>>> However `FS_XFLAG_DAX` is not handled in it. >>>> >>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux >>>> at the moment. I guess this will be fixed down the road, when DAX in >>>> the kernel becomes a non-experimental feature. >>>> >>>> How about we just do like this for now: >>>> >>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0 >>> >>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason. >>> So I introduced new -XX option to make users aware of it. >> >> The filesystem type check is there to help users avoid the mistake of >> placing the heap on an unintended/slow filesystem. However, most users >> will never use -XX:AllocateHeapAt, so I think that risk is fairly >> small to begin with. >> >> The bar for adding new options to ZGC is high, and I don't think it's >> high enough in this case. Also, other GCs happily allow you to place >> the heap on any filesystem and I don't mind having that flexibility in >> ZGC too. >> >>> >>> If not so, I agree with your change. >>> >> >> Ok, thanks. >> >> I updated the patch, added and adjusted some logging, and added a >> test. I also updated the bug title/description. >> >> http://cr.openjdk.java.net/~pliden/8239129/webrev.1 >> >> cheers, >> Per > From shade at redhat.com Mon Feb 24 16:12:00 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 24 Feb 2020 17:12:00 +0100 Subject: RFR (XS) 8239868: Shenandoah: ditch C2 node limit adjustments Message-ID: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8239868 We have the block added to Shenandoah arguments code that adjusts MaxNodeLimit and friends (predates inclusion of Shenandoah into mainline): https://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-August/006983.html At the time, it was prompted by observing that lots of barriers everywhere really needed to have this limit bumped. Today, with simplified LRB scheme, more simple LRB due to SFX, etc, we do not need this. The change above used ShenandoahCompileCheck, which made it into upstream code under generic AbortVMOnCompilationFailure. With that, I was able to verify that dropping the block does not yield compilation failures due to exceeded node budget on hotspot_gc_shenandoah, specjvm2008, specjbb2015. Performance numbers are also not affected (as expected). Therefore, the adjustment can be removed: diff -r 5c5dcd036a76 src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Mon Feb 24 11:01:51 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Mon Feb 24 17:09:58 2020 +0100 @@ -193,13 +193,4 @@ } - // Shenandoah needs more C2 nodes to compile some methods with lots of barriers. - // NodeLimitFudgeFactor needs to stay the same relative to MaxNodeLimit. -#ifdef COMPILER2 - if (FLAG_IS_DEFAULT(MaxNodeLimit)) { - FLAG_SET_DEFAULT(MaxNodeLimit, MaxNodeLimit * 3); - FLAG_SET_DEFAULT(NodeLimitFudgeFactor, NodeLimitFudgeFactor * 3); - } -#endif - // Make sure safepoint deadlocks are failing predictably. This sets up VM to report // fatal error after 10 seconds of wait for safepoint syncronization (not the VM Testing: hotspot_gc_shenandoah; benchmarks, +AbortVMOnCompilationFailure testing -- Thanks, -Aleksey From rkennke at redhat.com Mon Feb 24 16:22:50 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 24 Feb 2020 17:22:50 +0100 Subject: RFR (XS) 8239868: Shenandoah: ditch C2 node limit adjustments In-Reply-To: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com> References: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com> Message-ID: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8239868 > > We have the block added to Shenandoah arguments code that adjusts MaxNodeLimit and friends (predates > inclusion of Shenandoah into mainline): > https://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-August/006983.html > > At the time, it was prompted by observing that lots of barriers everywhere really needed to have > this limit bumped. Today, with simplified LRB scheme, more simple LRB due to SFX, etc, we do not > need this. > > The change above used ShenandoahCompileCheck, which made it into upstream code under generic > AbortVMOnCompilationFailure. With that, I was able to verify that dropping the block does not yield > compilation failures due to exceeded node budget on hotspot_gc_shenandoah, specjvm2008, specjbb2015. > Performance numbers are also not affected (as expected). > > Therefore, the adjustment can be removed: > > diff -r 5c5dcd036a76 src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Mon Feb 24 11:01:51 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Mon Feb 24 17:09:58 2020 +0100 > @@ -193,13 +193,4 @@ > } > > - // Shenandoah needs more C2 nodes to compile some methods with lots of barriers. > - // NodeLimitFudgeFactor needs to stay the same relative to MaxNodeLimit. > -#ifdef COMPILER2 > - if (FLAG_IS_DEFAULT(MaxNodeLimit)) { > - FLAG_SET_DEFAULT(MaxNodeLimit, MaxNodeLimit * 3); > - FLAG_SET_DEFAULT(NodeLimitFudgeFactor, NodeLimitFudgeFactor * 3); > - } > -#endif > - > // Make sure safepoint deadlocks are failing predictably. This sets up VM to report > // fatal error after 10 seconds of wait for safepoint syncronization (not the VM > > Testing: hotspot_gc_shenandoah; benchmarks, +AbortVMOnCompilationFailure testing Ok. Thank you! Roman From sangheon.kim at oracle.com Mon Feb 24 22:02:20 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 24 Feb 2020 14:02:20 -0800 Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously paused buffers In-Reply-To: References: Message-ID: <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com> Hi Kim, On 2/13/20 5:46 PM, Kim Barrett wrote: > Please review this simplification of the handling of previously paused > buffers by G1DirtyCardQueueSet. This change moves the call to > enqueue_previous_paused_buffers() into record_paused_buffer(). This > ensures any paused buffers from a previous safepoint have been flushed > out before recording a buffer for the next safepoint. > > This move eliminates the former precondition that the enqueue had to > have been performed before recording. > > This move also permits the enqueue_previous_paused_buffers in > get_completed_buffer() to be moved to a point where it will be called > much more rarely, slightly improving the normal performance of > get_dirtied_buffer. The old location of the call was in support of > the call order invariant needed by record_paused_buffer(). > > As a consequence of the changed enqueue locations, the fast path check > in enqueue_previous_paused_buffers() will now only rarely succeed, and > is no longer worth the (very small) performance cost and (much more > importantly) the largish block comment arguing its correctness. So > that fast path is removed. And since the raison d'etre for > PausedBuffers::is_empty() was to support that fast path, that function > is also removed. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238979 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8238979/open.00/ > > Testing: > mach5 tier1-5 in conjunction with other in-development changes. > Local (linux-x64) hotspot:tier1 for this change in isolation. Looks good to me. Thanks, Sangheon > From kim.barrett at oracle.com Tue Feb 25 03:33:33 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 24 Feb 2020 22:33:33 -0500 Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously paused buffers In-Reply-To: <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com> References: <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com> Message-ID: <2398C423-F6EA-497E-B3F1-929A48042C59@oracle.com> > On Feb 24, 2020, at 5:02 PM, sangheon.kim at oracle.com wrote: > On 2/13/20 5:46 PM, Kim Barrett wrote: >> [?] >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238979 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/ >> >> Testing: >> mach5 tier1-5 in conjunction with other in-development changes. >> Local (linux-x64) hotspot:tier1 for this change in isolation. > Looks good to me. > > Thanks, > Sangheon Thanks. From shade at redhat.com Tue Feb 25 08:05:03 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 25 Feb 2020 09:05:03 +0100 Subject: RFR (S) 8239904: Shenandoah: accumulated penalties should not be over 100% of capacity Message-ID: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8239904 See details in the bug. Fix: https://cr.openjdk.java.net/~shade/8239904/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Tue Feb 25 11:29:41 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Feb 2020 12:29:41 +0100 Subject: RFR (S) 8239904: Shenandoah: accumulated penalties should not be over 100% of capacity In-Reply-To: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com> References: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com> Message-ID: Yes, looks good! Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8239904 > > See details in the bug. > > Fix: > https://cr.openjdk.java.net/~shade/8239904/webrev.01/ > > Testing: hotspot_gc_shenandoah > From maoliang.ml at alibaba-inc.com Tue Feb 25 11:28:41 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 25 Feb 2020 19:28:41 +0800 Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?= =?UTF-8?B?aXN0aWNz?= Message-ID: <6cdfc61a-1e91-42ac-b1d8-725e3c45ff97.maoliang.ml@alibaba-inc.com> Hi Thomas, Do you have any testing result of the patch? I made a little change based on your webrev: http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/ to retain the shrink in remark which fixed the failure of JEP 346 and should handle the "idle" scenario. http://cr.openjdk.java.net/~luchsh/8236073.webrev.5/ Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Feb. 19 (Wed.) 18:56 To:"MAO, Liang" ; Stefan Johansson ; hotspot-gc-dev Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Hi, On 19.02.20 11:44, Liang Mao wrote: > Hi Thomas, > > When I was testing those benchmarks like specjbb2015 and specjvm2008, > the expansions mostly happened at remark. So I guess the expansion after > concurrent mark at peak usage based on a minimal capacity might > prevent several expansions in normal young collections. It's only my > thinking since I don't have much performance data. I don't have any > problems with expanding after young collection:) We'll collect perf data about this. > > BTW, do you and Stefan prefer to leave the shrink at remark for fixing > the failure of JEP346 and handling the idle scenario? Yes, and since Stefan suggested that we should shrink during Remark already I think he agrees. Thomas From erik.osterlund at oracle.com Tue Feb 25 13:10:40 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 25 Feb 2020 14:10:40 +0100 Subject: RFC: JEP: ZGC: Concurrent Execution Stack Processing Message-ID: Hi, I have created a JEP draft to add concurrent execution stack scanning to ZGC. https://bugs.openjdk.java.net/browse/JDK-8239600 Comments and feedback welcome. Thanks, /Erik From zgu at redhat.com Tue Feb 25 17:13:03 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 25 Feb 2020 12:13:03 -0500 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata Message-ID: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> Shenandoah encounters a few test failures with tools/javac. Verifier catches unmarked oops in nmethod's metadata during root evacuation in final mark phase. The problem is that, Shenandoah marks on stack nmethods in init mark pause, but it does not mark nmethod's metadata during concurrent mark phase, when new nmethod is about to be executed. The solution: 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the nmethod is about to be executed, when nmethod entry barrier is supported. 2) Remark on stack nmethod's metadata at final mark pause. Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and release) Thanks, -Zhengyu From hohensee at amazon.com Tue Feb 25 21:13:38 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 25 Feb 2020 21:13:38 +0000 Subject: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java In-Reply-To: References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com> Message-ID: That?s indeed dead code, so lgtm. Thanks, Paul From: serviceability-dev on behalf of Chris Plummer Date: Tuesday, February 25, 2020 at 10:04 AM To: "linzang(??)" , serviceability-dev , "hotspot-gc-dev at openjdk.java.net" Subject: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java Adding hotspot-gc-dev. Chris On 2/25/20 2:21 AM, linzang(??) wrote: Hi, Please review the following change: Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916 webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/ Thanks, Lin From stefan.karlsson at oracle.com Tue Feb 25 21:46:59 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 25 Feb 2020 22:46:59 +0100 Subject: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java In-Reply-To: References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com> Message-ID: <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com> Looks good. This is left-overs from the CMS removal. StefanK On 2020-02-25 19:02, Chris Plummer wrote: > Adding hotspot-gc-dev. > > Chris > > On 2/25/20 2:21 AM, linzang(??) wrote: >> Hi, >> ? ? Please review the following change: >> ? ? Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916 >> ? ? webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/ >> >> Thanks, >> Lin > From linzang at tencent.com Wed Feb 26 02:47:35 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Wed, 26 Feb 2020 02:47:35 +0000 Subject: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail) In-Reply-To: <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com> References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com> <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com> Message-ID: <8C8E0733-3076-49F1-9527-F11A8860661C@tencent.com> Thanks for reviewing, so can this change be merged now? BRs, Lin > On Feb 26, 2020, at 5:46 AM, Stefan Karlsson wrote: > > Looks good. This is left-overs from the CMS removal. > > StefanK > > On 2020-02-25 19:02, Chris Plummer wrote: >> Adding hotspot-gc-dev. >> >> Chris >> >> On 2/25/20 2:21 AM, linzang(??) wrote: >>> Hi, >>> Please review the following change: >>> Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916 >>> webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/ >>> >>> Thanks, >>> Lin >> > > From stefan.johansson at oracle.com Wed Feb 26 09:07:08 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 26 Feb 2020 10:07:08 +0100 Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously paused buffers In-Reply-To: References: Message-ID: Hi Kim, > 14 feb. 2020 kl. 02:46 skrev Kim Barrett : > > Please review this simplification of the handling of previously paused > buffers by G1DirtyCardQueueSet. This change moves the call to > enqueue_previous_paused_buffers() into record_paused_buffer(). This > ensures any paused buffers from a previous safepoint have been flushed > out before recording a buffer for the next safepoint. > > This move eliminates the former precondition that the enqueue had to > have been performed before recording. > > This move also permits the enqueue_previous_paused_buffers in > get_completed_buffer() to be moved to a point where it will be called > much more rarely, slightly improving the normal performance of > get_dirtied_buffer. The old location of the call was in support of > the call order invariant needed by record_paused_buffer(). > > As a consequence of the changed enqueue locations, the fast path check > in enqueue_previous_paused_buffers() will now only rarely succeed, and > is no longer worth the (very small) performance cost and (much more > importantly) the largish block comment arguing its correctness. So > that fast path is removed. And since the raison d'etre for > PausedBuffers::is_empty() was to support that fast path, that function > is also removed. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238979 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8238979/open.00/ Looks good, StefanJ > > Testing: > mach5 tier1-5 in conjunction with other in-development changes. > Local (linux-x64) hotspot:tier1 for this change in isolation. > From shade at redhat.com Wed Feb 26 09:19:26 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Feb 2020 10:19:26 +0100 Subject: RFR (XS) 8240069: Shenandoah: turn more flags diagnostic Message-ID: <80e3c299-575d-1603-0341-6176738b1280@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8240069 Webrev: http://cr.openjdk.java.net/~shade/8240069/webrev.01/ Regular sweep of flags that are experimental, but have been used as diagnostic. Diagnostic flags are usually for features that are enabled by default, and are not expected to be disabled, unless someone is chasing the bug. Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From shade at redhat.com Wed Feb 26 09:38:21 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Feb 2020 10:38:21 +0100 Subject: RFR (S) 8240070: Shenandoah: remove obsolete ShenandoahCommonGCStateLoads Message-ID: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8240070 This is the leftover of the older experiment that optimized the frequently emitted barriers. With the switch to LRB and questionable performance improvements (sometimes hijacked by elevated register pressure), it makes less sense to keep the option exposed and C2 code more complicated. Removal webrev: https://cr.openjdk.java.net/~shade/8240070/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From rkennke at redhat.com Wed Feb 26 10:00:18 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 26 Feb 2020 11:00:18 +0100 Subject: RFR (XS) 8240069: Shenandoah: turn more flags diagnostic In-Reply-To: <80e3c299-575d-1603-0341-6176738b1280@redhat.com> References: <80e3c299-575d-1603-0341-6176738b1280@redhat.com> Message-ID: <6009896c-853d-16fb-6769-fcf1b97387ac@redhat.com> Yes, that makes sense! Thank you! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240069 > > Webrev: > http://cr.openjdk.java.net/~shade/8240069/webrev.01/ > > Regular sweep of flags that are experimental, but have been used as diagnostic. Diagnostic flags are > usually for features that are enabled by default, and are not expected to be disabled, unless > someone is chasing the bug. > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From rkennke at redhat.com Wed Feb 26 10:03:47 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 26 Feb 2020 11:03:47 +0100 Subject: RFR (S) 8240070: Shenandoah: remove obsolete ShenandoahCommonGCStateLoads In-Reply-To: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com> References: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com> Message-ID: <58632b60-f466-d443-d520-78da2702b096@redhat.com> As far as I understand, this optimization pass would not work with LRB anyway. So yeah, please remove it. Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240070 > > This is the leftover of the older experiment that optimized the frequently emitted barriers. With > the switch to LRB and questionable performance improvements (sometimes hijacked by elevated register > pressure), it makes less sense to keep the option exposed and C2 code more complicated. > > Removal webrev: > https://cr.openjdk.java.net/~shade/8240070/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From shade at redhat.com Wed Feb 26 11:52:39 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Feb 2020 12:52:39 +0100 Subject: RFR (S) 8240076: Shenandoah: pacer should cover reset and preclean phases Message-ID: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8240076 See bug above for discussion. Webrev: https://cr.openjdk.java.net/~shade/8240076/webrev.01/ Testing: hotspot_gc_shenandoah, eyeballing logs -- Thanks, -Aleksey From erik.gahlin at oracle.com Wed Feb 26 12:50:45 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Wed, 26 Feb 2020 13:50:45 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal Message-ID: Hi, Could I have a review of a JFR event that is emitted when System.gc() is called. Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. Bug: https://bugs.openjdk.java.net/browse/JDK-8003216 Webrev: http://cr.openjdk.java.net/~egahlin/8003216/ Testing: tier1+tier2+jdk/jdk/jfr Thanks Erik From per.liden at oracle.com Wed Feb 26 12:56:45 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 26 Feb 2020 13:56:45 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: References: Message-ID: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> Hi Erik, On 2020-02-26 13:50, Erik Gahlin wrote: > Hi, > > Could I have a review of a JFR event that is emitted when System.gc() is > called. > > Purpose is to collect the stack trace. It is not sufficient with the > cause field that the GarbageCollection event has today. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8003216 > > Webrev: > http://cr.openjdk.java.net/~egahlin/8003216/ 489 EventSystemGC event; 490 event.commit(); 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); Don't you want the commit() call after the call to collect(), to get the timing right? cheers, Per > > Testing: > tier1+tier2+jdk/jdk/jfr > > Thanks > Erik > > From stefan.johansson at oracle.com Wed Feb 26 13:21:16 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 26 Feb 2020 14:21:16 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> Message-ID: Hi Erik, > 26 feb. 2020 kl. 13:56 skrev Per Liden : > > Hi Erik, > > On 2020-02-26 13:50, Erik Gahlin wrote: >> Hi, >> Could I have a review of a JFR event that is emitted when System.gc() is called. >> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8003216 >> Webrev: >> http://cr.openjdk.java.net/~egahlin/8003216/ > > 489 EventSystemGC event; > 490 event.commit(); > 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); > > Don't you want the commit() call after the call to collect(), to get the timing right? I was thinking the same thing, could also be nice to have the GC-id associated with the event to make it easy to match it to GC-logs and other GC-events. Not sure how to easily get the GC-id though, since it?s not set at the time we commit the event. I guess if the event has the correct span with timestamps it will be easy to figure out which other events are associated with it, even without the GC-id. Cheers, Stefan > > cheers, > Per > >> Testing: >> tier1+tier2+jdk/jdk/jfr >> Thanks >> Erik From zgu at redhat.com Wed Feb 26 13:32:34 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 26 Feb 2020 08:32:34 -0500 Subject: RFR (S) 8240076: Shenandoah: pacer should cover reset and preclean phases In-Reply-To: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com> References: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com> Message-ID: Looks good to me. -Zhengyu On 2/26/20 6:52 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240076 > > See bug above for discussion. > > Webrev: > https://cr.openjdk.java.net/~shade/8240076/webrev.01/ > > Testing: hotspot_gc_shenandoah, eyeballing logs > From erik.gahlin at oracle.com Wed Feb 26 13:50:41 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Wed, 26 Feb 2020 14:50:41 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> Message-ID: <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com> Hi Per, My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed. I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning? Thanks Erik > On 26 Feb 2020, at 13:56, Per Liden wrote: > > Hi Erik, > > On 2020-02-26 13:50, Erik Gahlin wrote: >> Hi, >> Could I have a review of a JFR event that is emitted when System.gc() is called. >> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8003216 >> Webrev: >> http://cr.openjdk.java.net/~egahlin/8003216/ > > 489 EventSystemGC event; > 490 event.commit(); > 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); > > Don't you want the commit() call after the call to collect(), to get the timing right? > > cheers, > Per > >> Testing: >> tier1+tier2+jdk/jdk/jfr >> Thanks >> Erik From kim.barrett at oracle.com Wed Feb 26 13:56:39 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 26 Feb 2020 08:56:39 -0500 Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously paused buffers In-Reply-To: References: Message-ID: <9A987C14-942E-4CED-9E89-A6678AA6F9B0@oracle.com> > On Feb 26, 2020, at 4:07 AM, Stefan Johansson wrote: > > Hi Kim, > >> 14 feb. 2020 kl. 02:46 skrev Kim Barrett : >> >> Please review this simplification of the handling of previously paused >> buffers by G1DirtyCardQueueSet. This change moves the call to >> enqueue_previous_paused_buffers() into record_paused_buffer(). This >> ensures any paused buffers from a previous safepoint have been flushed >> out before recording a buffer for the next safepoint. >> >> [?] >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238979 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/ > Looks good, > StefanJ Thanks. > >> >> Testing: >> mach5 tier1-5 in conjunction with other in-development changes. >> Local (linux-x64) hotspot:tier1 for this change in isolation. From per.liden at oracle.com Wed Feb 26 14:02:01 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 26 Feb 2020 15:02:01 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com> Message-ID: <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com> Hi, On 2/26/20 2:50 PM, Erik Gahlin wrote: > Hi Per, > > My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed. I would have expected the event start time to be before the collection starts, and the end time when it's done. > > I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning? Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they do the same, which is wait until the "System.gc" collection has completed. For ZGC, should a GC cycle be in progress when a call to System.gc() is made, it will first wait for the in progress cycle to finish, and then execute the "System.gc" cycle before returning. cheers, Per > > Thanks > Erik > >> On 26 Feb 2020, at 13:56, Per Liden wrote: >> >> Hi Erik, >> >> On 2020-02-26 13:50, Erik Gahlin wrote: >>> Hi, >>> Could I have a review of a JFR event that is emitted when System.gc() is called. >>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>> Webrev: >>> http://cr.openjdk.java.net/~egahlin/8003216/ >> >> 489 EventSystemGC event; >> 490 event.commit(); >> 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); >> >> Don't you want the commit() call after the call to collect(), to get the timing right? >> >> cheers, >> Per >> >>> Testing: >>> tier1+tier2+jdk/jdk/jfr >>> Thanks >>> Erik > From erik.gahlin at oracle.com Wed Feb 26 15:17:29 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Wed, 26 Feb 2020 16:17:29 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com> <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com> Message-ID: > On 26 Feb 2020, at 15:02, Per Liden wrote: > > Hi, > > On 2/26/20 2:50 PM, Erik Gahlin wrote: >> Hi Per, >> My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed. > > I would have expected the event start time to be before the collection starts, and the end time when it's done. We have sometimes sorted events by their end time, or when they are committed (written to the buffer).This works better than start time in some cases. > >> I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning? > > Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they do the same, which is wait until the "System.gc" collection has completed. For ZGC, should a GC cycle be in progress when a call to System.gc() is made, it will first wait for the in progress cycle to finish, and then execute the "System.gc" cycle before returning. If users expect the System GC event to measure the time the Java thread is blocked, making the event timed makes sense. If users expect the duration to be the length of the triggered GC, or even the pause time, it would mislead users to make it timed. Erik > > cheers, > Per > >> Thanks >> Erik >>> On 26 Feb 2020, at 13:56, Per Liden wrote: >>> >>> Hi Erik, >>> >>> On 2020-02-26 13:50, Erik Gahlin wrote: >>>> Hi, >>>> Could I have a review of a JFR event that is emitted when System.gc() is called. >>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>>> Webrev: >>>> http://cr.openjdk.java.net/~egahlin/8003216/ >>> >>> 489 EventSystemGC event; >>> 490 event.commit(); >>> 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); >>> >>> Don't you want the commit() call after the call to collect(), to get the timing right? >>> >>> cheers, >>> Per >>> >>>> Testing: >>>> tier1+tier2+jdk/jdk/jfr >>>> Thanks >>>> Erik From erik.gahlin at oracle.com Wed Feb 26 17:28:13 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Wed, 26 Feb 2020 18:28:13 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> Message-ID: Hi Stefan, GC-id would be nice, but perhaps not possible in all scenarios, i.e. -XX:+ExplicitGCInvokesConcurrent and Epsilon GC? Thanks Erik On 2020-02-26 14:21, Stefan Johansson wrote: > Hi Erik, > >> 26 feb. 2020 kl. 13:56 skrev Per Liden : >> >> Hi Erik, >> >> On 2020-02-26 13:50, Erik Gahlin wrote: >>> Hi, >>> Could I have a review of a JFR event that is emitted when System.gc() is called. >>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>> Webrev: >>> http://cr.openjdk.java.net/~egahlin/8003216/ >> 489 EventSystemGC event; >> 490 event.commit(); >> 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); >> >> Don't you want the commit() call after the call to collect(), to get the timing right? > I was thinking the same thing, could also be nice to have the GC-id associated with the event to make it easy to match it to GC-logs and other GC-events. Not sure how to easily get the GC-id though, since it?s not set at the time we commit the event. > > I guess if the event has the correct span with timestamps it will be easy to figure out which other events are associated with it, even without the GC-id. > > Cheers, > Stefan > >> cheers, >> Per >> >>> Testing: >>> tier1+tier2+jdk/jdk/jfr >>> Thanks >>> Erik From mikhailo.seledtsov at oracle.com Wed Feb 26 21:07:42 2020 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Wed, 26 Feb 2020 13:07:42 -0800 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: References: Message-ID: <565e3578-0b67-41e0-13a4-3c5ac0b86904@oracle.com> Looks good to me, Misha On 2/26/20 4:50 AM, Erik Gahlin wrote: > Hi, > > Could I have a review of a JFR event that is emitted when System.gc() > is called. > > Purpose is to collect the stack trace. It is not sufficient with the > cause field that the GarbageCollection event has today. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8003216 > > Webrev: > http://cr.openjdk.java.net/~egahlin/8003216/ > > Testing: > tier1+tier2+jdk/jdk/jfr > > Thanks > Erik > > From felixxfyang at tencent.com Thu Feb 27 08:41:24 2020 From: felixxfyang at tencent.com (=?utf-8?B?ZmVsaXh4Znlhbmco5p2o5pmT5bOwKQ==?=) Date: Thu, 27 Feb 2020 08:41:24 +0000 Subject: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail) In-Reply-To: <8C202D0D-DEB9-43F5-B06F-822AC9AF430F@tencent.com> References: <8C202D0D-DEB9-43F5-B06F-822AC9AF430F@tencent.com> Message-ID: Copy correct alias -Felix ???: "felixxfyang(???)" ??: 2020?2?27? ??? ??4:29 ???: "linzang(??)" ??: hotspot-gc-dev ??: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail) Hi Lin, Suppose yes, this change looks trivial. I can sponsor to push it. Thanks, Felix From: linzang(??) Date: 2020-02-26 10:47 To: Stefan Karlsson; Paul Hohensee CC: linzang(??); Chris Plummer; serviceability-dev; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail) Thanks for reviewing, so can this change be merged now? BRs, Lin > On Feb 26, 2020, at 5:46 AM, Stefan Karlsson wrote: > > Looks good. This is left-overs from the CMS removal. > > StefanK > > On 2020-02-25 19:02, Chris Plummer wrote: >> Adding hotspot-gc-dev. >> >> Chris >> >> On 2/25/20 2:21 AM, linzang(??) wrote: >>> Hi, >>> Please review the following change: >>> Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916 >>> webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/ >>> >>> Thanks, >>> Lin >> > > From stefan.johansson at oracle.com Thu Feb 27 09:13:50 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 27 Feb 2020 10:13:50 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> Message-ID: <73577b98-c59e-9e80-b966-a11d501952d8@oracle.com> Hi Erik, On 2020-02-26 18:28, Erik Gahlin wrote: > Hi Stefan, > > GC-id would be nice, but perhaps not possible in all scenarios, i.e. > -XX:+ExplicitGCInvokesConcurrent and Epsilon GC? For ExplicitGCInvokesConcurrent it would not be a big problem, that would start a concurrent cycle and we could use the id for that GC. I also realized that we can get the GC-id without any problem. For other events sent before the GC-id is properly setup, we use GCId::peek() which returns the id that will be used for the next collection. For Epsilon, I'm not sure an event should be sent at all since they are blocked, see: EpsilonHeap::collect(...) Thanks, Stefan > > Thanks > Erik > > On 2020-02-26 14:21, Stefan Johansson wrote: >> Hi Erik, >> >>> 26 feb. 2020 kl. 13:56 skrev Per Liden : >>> >>> Hi Erik, >>> >>> On 2020-02-26 13:50, Erik Gahlin wrote: >>>> Hi, >>>> Could I have a review of a JFR event that is emitted when >>>> System.gc() is called. >>>> Purpose is to collect the stack trace. It is not sufficient with the >>>> cause field that the GarbageCollection event has today. >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>>> Webrev: >>>> http://cr.openjdk.java.net/~egahlin/8003216/ >>> 489???? EventSystemGC event; >>> 490???? event.commit(); >>> 491???? Universe::heap()->collect(GCCause::_java_lang_system_gc); >>> >>> Don't you want the commit() call after the call to collect(), to get >>> the timing right? >> I was thinking the same thing, could also be nice to have the GC-id >> associated with the event to make it easy to match it to GC-logs and >> other GC-events. Not sure how to easily get the GC-id though, since >> it?s not set at the time we commit the event. >> >> I guess if the event has the correct span with timestamps it will be >> easy to figure out which other events are associated with it, even >> without the GC-id. >> >> Cheers, >> Stefan >> >>> cheers, >>> Per >>> >>>> Testing: >>>> tier1+tier2+jdk/jdk/jfr >>>> Thanks >>>> Erik From per.liden at oracle.com Thu Feb 27 12:32:20 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 27 Feb 2020 13:32:20 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com> <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com> Message-ID: Hi, On 2/26/20 4:17 PM, Erik Gahlin wrote: > >> On 26 Feb 2020, at 15:02, Per Liden wrote: >> >> Hi, >> >> On 2/26/20 2:50 PM, Erik Gahlin wrote: >>> Hi Per, >>> My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed. >> >> I would have expected the event start time to be before the collection starts, and the end time when it's done. > > We have sometimes sorted events by their end time, or when they are committed (written to the buffer).This works better than start time in some cases. > >> >>> I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning? >> >> Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they do the same, which is wait until the "System.gc" collection has completed. For ZGC, should a GC cycle be in progress when a call to System.gc() is made, it will first wait for the in progress cycle to finish, and then execute the "System.gc" cycle before returning. > > If users expect the System GC event to measure the time the Java thread is blocked, making the event timed makes sense. > > If users expect the duration to be the length of the triggered GC, or even the pause time, it would mislead users to make it timed. I agree. I'm thinking the event time should just reflect how long time the Java thread was blocked, waiting for System.gc() to complete. cheers, Per > > Erik > >> >> cheers, >> Per >> >>> Thanks >>> Erik >>>> On 26 Feb 2020, at 13:56, Per Liden wrote: >>>> >>>> Hi Erik, >>>> >>>> On 2020-02-26 13:50, Erik Gahlin wrote: >>>>> Hi, >>>>> Could I have a review of a JFR event that is emitted when System.gc() is called. >>>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today. >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~egahlin/8003216/ >>>> >>>> 489 EventSystemGC event; >>>> 490 event.commit(); >>>> 491 Universe::heap()->collect(GCCause::_java_lang_system_gc); >>>> >>>> Don't you want the commit() call after the call to collect(), to get the timing right? >>>> >>>> cheers, >>>> Per >>>> >>>>> Testing: >>>>> tier1+tier2+jdk/jdk/jfr >>>>> Thanks >>>>> Erik > From zgu at redhat.com Thu Feb 27 13:21:24 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 27 Feb 2020 08:21:24 -0500 Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> Message-ID: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> Hi, Based on Erik's suggestion from JDK-8238633 review [1], we can filter out oops marked by JVMTI and JFR leak profiler via resolve_forwarded() barrier, by inserting an null check on forwarding pointer. To reduce performance impact, we split up compiler and runtime resolve forwarded barrier, only performs extra null check in runtime barrier, as JVMTI and leak profiler heap walk are performed at safepoints, where mutators are stopped. Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/ Test: hotspot_gc_shenandoah vmTestbase_nsk_jvmti vmTestbase_nsk_jdi Thanks, -Zhengyu [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-February/040974.html On 2/4/20 2:23 PM, Aleksey Shipilev wrote: > On 2/3/20 9:59 PM, Zhengyu Gu wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ > > Uh. It seems to me the cure is worse than the disease: > 1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough > testing, and we are too deep in RDP2 for this; > 2) It effectively disables asserts for anything not in collection set. Which means it disables > most of asserts. The fact that Verifier still works is a small consolation. > > I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with > mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe. > From shade at redhat.com Thu Feb 27 13:24:35 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 27 Feb 2020 14:24:35 +0100 Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> Message-ID: On 2/27/20 2:21 PM, Zhengyu Gu wrote: > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/ This looks good to me. Let Roman look through it as well. -- Thanks, -Aleksey From shade at redhat.com Thu Feb 27 13:26:57 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 27 Feb 2020 14:26:57 +0100 Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> Message-ID: <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com> On 2/27/20 2:24 PM, Aleksey Shipilev wrote: > On 2/27/20 2:21 PM, Zhengyu Gu wrote: >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/ > > This looks good to me. Suggestion to change the synopsis, though: "Shenandoah: accept NULL fwdptr to cooperate with JVMTI and JFR" -- Thanks, -Aleksey From zgu at redhat.com Thu Feb 27 13:29:08 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 27 Feb 2020 08:29:08 -0500 Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com> References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com> Message-ID: <019a2d3f-9bc1-f28d-4b69-bd3461d10c6d@redhat.com> On 2/27/20 8:26 AM, Aleksey Shipilev wrote: > On 2/27/20 2:24 PM, Aleksey Shipilev wrote: >> On 2/27/20 2:21 PM, Zhengyu Gu wrote: >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/ >> >> This looks good to me. > > Suggestion to change the synopsis, though: > "Shenandoah: accept NULL fwdptr to cooperate with JVMTI and JFR" Thanks for the review, Aleksey. I will fix the synopsis before push. -Zhengyu > From rkennke at redhat.com Thu Feb 27 14:54:04 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Feb 2020 15:54:04 +0100 Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with "Forwardee must point to a heap address" In-Reply-To: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com> <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com> Message-ID: <893bb323-ee07-7621-e80f-41e899064a65@redhat.com> Looks good to me. Thank you! Roman > Hi, > > Based on Erik's suggestion from JDK-8238633 review [1], we can filter > out oops marked by JVMTI and JFR leak profiler via resolve_forwarded() > barrier, by inserting an null check on forwarding pointer. > > To reduce performance impact, we split up compiler and runtime resolve > forwarded barrier, only performs extra null check in runtime barrier, as > JVMTI and leak profiler heap walk are performed at safepoints, where > mutators are stopped. > > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/ > > Test: > ? hotspot_gc_shenandoah > ? vmTestbase_nsk_jvmti > ? vmTestbase_nsk_jdi > > Thanks, > > -Zhengyu > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-February/040974.html > > > > > On 2/4/20 2:23 PM, Aleksey Shipilev wrote: >> On 2/3/20 9:59 PM, Zhengyu Gu wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/ >> >> Uh. It seems to me the cure is worse than the disease: >> ?? 1) It rewires sensitive parts of barrier paths, root handling, etc, >> which requires more thorough >> testing, and we are too deep in RDP2 for this; >> ?? 2) It effectively disables asserts for anything not in collection >> set. Which means it disables >> most of asserts. The fact that Verifier still works is a small >> consolation. >> >> I propose to accept this failure in 14, and rework the JVMTI heap walk >> to stop messing around with >> mark words in 15. Since this relates to concurrent root handling, >> 11-shenandoah is already safe. >> > From rkennke at redhat.com Thu Feb 27 14:54:16 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 27 Feb 2020 15:54:16 +0100 Subject: [15] RFR 8239354: Shenandoah: minor enhancements to traversal GC In-Reply-To: References: Message-ID: <60257cc5-6e76-cc71-79d8-42c96de669b6@redhat.com> Hi Zhengyu, This looks good to me, thank you! Roman > 1) Added assertion to catch evacuation after completion of heap > traversal. This should help catch the bug demonstrated in sh-jdk11 w/o > JDK-8237396. > > 2) Retire TLAB/GCLAB after completion of heap traversal. Current code > retires TLAB/GCLAB at the beginning final traversal, but STW traversal > still uses GCLAB to evacuate remaining objects. > > 3) Added comments regarding why need to retire TLAB/GCLAB, even we don't > need heap to be parsable. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8239354 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239354/webrev.00/index.html > > Test: > ? hotspot_gc_shenandoah > > Thanks, > > -Zhengyu > From shade at redhat.com Fri Feb 28 09:53:14 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Feb 2020 10:53:14 +0100 Subject: RFR (XS) 8240216: Shenandoah: remove ShenandoahTerminationTrace Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240216 This was the diagnostic option for working on improving the termination protocol. Now that VM had moved globally to OWST as termination protocol, this seems to only increase the maintenance burden. The option is turned off by default already. Zhengyu, do you agree? Webrev: https://cr.openjdk.java.net/~shade/8240216/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From shade at redhat.com Fri Feb 28 10:06:15 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Feb 2020 11:06:15 +0100 Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240217 ShenandoahEvacAssist is an experimental option that strived to make calling into WB/LRB slowpath less frequent. It implicitly relied on WB/LRB midpath to check for forwardee and shortcut from there. With the introduction of self-fixing barriers, this was intentionally removed. Therefore, Shenandoah would call into slow-path anyway, even when evac-assist path had evacuated some objects. Also, with Traversal, the assist path breaks out of Traversal's intent to evacuate the objects in traversal order. There, it becomes actively harmful. We should consider removing it. Webrev: https://cr.openjdk.java.net/~shade/8240217/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From rkennke at redhat.com Fri Feb 28 10:52:01 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Feb 2020 11:52:01 +0100 Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist In-Reply-To: References: Message-ID: Have you done any performance experiments? A (not so long) while back, I ran SPECjbb2015 with and without the option, and couldn't measure a difference. If anything, latency slightly improved with evac-assist turned off. Other than that, good. Less code, less maintenance. Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240217 > > ShenandoahEvacAssist is an experimental option that strived to make calling into WB/LRB slowpath > less frequent. > > It implicitly relied on WB/LRB midpath to check for forwardee and shortcut from there. With the > introduction of self-fixing barriers, this was intentionally removed. Therefore, Shenandoah would > call into slow-path anyway, even when evac-assist path had evacuated some objects. > > Also, with Traversal, the assist path breaks out of Traversal's intent to evacuate the objects in > traversal order. There, it becomes actively harmful. We should consider removing it. > > Webrev: > https://cr.openjdk.java.net/~shade/8240217/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From zgu at redhat.com Fri Feb 28 13:26:12 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 28 Feb 2020 08:26:12 -0500 Subject: RFR (XS) 8240216: Shenandoah: remove ShenandoahTerminationTrace In-Reply-To: References: Message-ID: <96a74fad-6463-8793-8ffb-60d62254cd0e@redhat.com> On 2/28/20 4:53 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240216 > > This was the diagnostic option for working on improving the termination protocol. Now that VM had > moved globally to OWST as termination protocol, this seems to only increase the maintenance burden. > The option is turned off by default already. > > Zhengyu, do you agree? Okay. -Zhengyu > > Webrev: > https://cr.openjdk.java.net/~shade/8240216/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From shade at redhat.com Fri Feb 28 14:24:06 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Feb 2020 15:24:06 +0100 Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist In-Reply-To: References: Message-ID: <633fc9e0-5b2c-9be4-8c74-0a29149cb0ef@redhat.com> On 2/28/20 11:52 AM, Roman Kennke wrote: > Have you done any performance experiments? Just finished: no improvement/regressions, unless it hides in the noise. -- Thanks, -Aleksey From m.sundar85 at gmail.com Fri Feb 28 18:57:12 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Fri, 28 Feb 2020 13:57:12 -0500 Subject: Parallel GC Thread crash In-Reply-To: <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com> References: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com> Message-ID: Hi Stefan, I tried running with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC but it seems some of the operations are timing out (ex. ssl connect not sure if i have very low timeout or this flag increases the latency) Also from the crash error log i see following ... 0x00007f0588066000 GCTaskThread "ParGC Thread#20" [stack: 0x00007ef5c4d13000,0x00007ef5c4e13000] [id=87534] *=>0x00007f0588068000 GCTaskThread "ParGC Thread#21" [stack: 0x00007ef5c4c12000,0x00007ef5c4d12000] [id=87535]* 0x00007f0588069800 GCTaskThread "ParGC Thread#22" [stack: 0x00007ef5c4b11000,0x00007ef5c4c11000] [id=87536] ... Threads with active compile tasks: *VM state:at safepoint (normal execution)* VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event]) [0x00007f0588015750] Threads_lock - owner thread: 0x00007f0588112800 [0x00007f0588016350] Heap_lock - owner thread: 0x00007ef4100f9000 Does this mean VM is in safepoint and executing GC operation or JVM related activity which requires to be in safepoint (ie. not executing user code) when it crashed? I am trying to see if library or application code is causing this behaviour. Thanks Sundar On Mon, Feb 10, 2020 at 3:13 PM Stefan Karlsson wrote: > On 2020-02-10 20:53, Sundara Mohan M wrote: > > Hi Stefan, > > Yes we are trying to move to 13.0.2. Wanted to verify if anyone > > else seen this or upgrading will really solve this problem. > > > > Can you share how to file a bug report for this? I don't have access > > to https://bugs.openjdk.java.net/ > > There are directions in the hs_err crash file that points you to the web > page to file a bug. > > You seem to be running AdoptJDK builds so your bug reports would end up > at their system: > > > # If you would like to submit a bug report, please visit: > > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > > If you were running with Oracle binaries you would get lines like this: > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > > > > > I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to > > get more information. > > OK. Hopefully this gives us more information. > > StefanK > > > > > > Thanks > > Sundar > > > > On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson > > > wrote: > > > > Hi Sundar, > > > > On 2020-02-10 19:32, Sundara Mohan M wrote: > > > Hi Stefan, > > > We started seeing more crashes on JDK13.0.1+9 > > > > > > Since seeing it on GC Task Thread assumed it is related to GC. > > > > As I said in my previous mail, I don't think this is caused by GC > > code. > > More below. > > > > > > > > # Problematic frame: > > > # V [libjvm.so+0xd183c0] > > PSRootsClosure::do_oop(oopDesc**)+0x30 > > > > > > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m > > > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc > > > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh > > > reads=5 ... > > > > > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, > > Red > > > Hat Enterprise Linux Server release 6.10 (Santiago) > > > Time: Fri Feb 7 11:15:04 2020 UTC elapsed time: 286290 seconds > > (3d 7h > > > 31m 30s) > > > > > > --------------- T H R E A D --------------- > > > > > > Current thread (0x00007fca6c074000): GCTaskThread "ParGC > > Thread#28" > > > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > > > > > > Stack: [0x00007fba72ff1000,0x00007fba730f1000], > > > sp=0x00007fba730ee850, free space=1014k > > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > > j=interpreted, Vv=VM code, C=native code) > > > V [libjvm.so+0xd183c0] > > PSRootsClosure::do_oop(oopDesc**)+0x30 > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > > RegisterMap > > > const*, OopClosure*)+0x2eb > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > > CodeBlobClosure*)+0x187 > > > V [libjvm.so+0xd190be] ThreadRootsTask::do_it(GCTaskManager*, > > > unsigned int)+0x6e > > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > > > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed > > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > > v ~RuntimeStub::_new_array_Java > > > J 58520 c2 > > > > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > > > (207 bytes) @ 0x00007fca5fd23dec > > [0x00007fca5fd1dbc0+0x000000000000622c] > > > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V > > > (1004 bytes) @ 0x00007fca60c02588 > > [0x00007fca60bffce0+0x00000000000028a8] > > > J 58224 c2 > > > > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > > > (105 bytes) @ 0x00007fca5f59bad8 > > [0x00007fca5f59b880+0x0000000000000258] > > > J 69992 c2 > > > > > > webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > > > (9 bytes) @ 0x00007fca5e1019f4 > > [0x00007fca5e101940+0x00000000000000b4] > > > J 55265 c2 > > > > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > > > (332 bytes) @ 0x00007fca5f6f58e0 > > [0x00007fca5f6f5700+0x00000000000001e0] > > > J 483122 c2 > > webservice.filters.ResponseSerializationWorker.execute()Z > > > (272 bytes) @ 0x00007fca622fc2b4 > > [0x00007fca622fbc80+0x0000000000000634] > > > J 15811% c2 > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > > > (486 bytes) @ 0x00007fca5c108794 > > [0x00007fca5c1082a0+0x00000000000004f4] > > > j > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > > J 4586 c1 java.util.concurrent.FutureTask.run()V > > java.base at 13.0.1 (123 > > > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684] > > > J 7550 c1 > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > > > [0x00007fca54fba8e0+0x0000000000000df4] > > > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > > > [0x00007fca5454b8c0+0x000000000000007c] > > > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > > > v ~StubRoutines::call_stub > > > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > > 0x0000000000000000 > > > > > > Does JDK11 and 13 have different code for GC. Do you think > > > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here? > > > > You should at least move to 13.0.2, to get the latest bug > > fixes/patches. > > > > There has been a lot of changes in all areas of the JVM between 11 > > and > > 13. We don't yet know the root cause of this crash, and I can't > > say if > > this is caused by new changes or not. Have you or anyone filed a bug > > report for this? > > > > > Any insight to debug this will be helpful. > > > > Did you try my previous suggestion to run with -XX:+VerifyBeforeGC > > and > > -XX:+VerifyAfterGC? If you can tolerate the longer GC times it > > introduces, then you could try to run with > > -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC > > -XX:+VerifyAfterGC . > > > > Cheers, > > StefanK > > > > > > > > TIA > > > Sundar > > > > > > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > > > > > > >> wrote: > > > > > > Hi Sundar, > > > > > > The GC crashes when it encounters something bad on the stack: > > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > > > RegisterMap > > > > const*, OopClosure*)+0x2eb > > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > > > > This is probably not a GC bug. It's more likely that this is > > > caused by > > > the JIT compiler. I see in your hotspot-runtime-dev thread, > > that you > > > also get crashes in other compiler related areas. > > > > > > If you want to rule out the GC, you can run with > > > -XX:+VerifyBeforeGC and > > > -XX:+VerifyAfterGC, and see if this asserts before the GC > > has started > > > running. > > > > > > StefanK > > > > > > On 2020-02-04 04:38, Sundara Mohan M wrote: > > > > Hi, > > > > I am seeing following crashes frequently on our servers > > > > # > > > > # A fatal error has been detected by the Java Runtime > > Environment: > > > > # > > > > # SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, > > tid=108299 > > > > # > > > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build > > > 13.0.1+9) > > > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, > > > tiered, parallel > > > > gc, linux-amd64) > > > > # Problematic frame: > > > > # V [libjvm.so+0xcd3311] > > > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > > # > > > > # No core dump will be written. Core dumps have been > disabled. > > > To enable > > > > core dumping, try "ulimit -c unlimited" before starting > > Java again > > > > # > > > > # If you would like to submit a bug report, please visit: > > > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > > > # > > > > > > > > > > > > --------------- T H R E A D --------------- > > > > > > > > Current thread (0x00007fca2c051000): GCTaskThread "ParGC > > > Thread#8" [stack: > > > > 0x00007fca30277000,0x00007fca30377000] [id=108299] > > > > > > > > Stack: [0x00007fca30277000,0x00007fca30377000], > > > sp=0x00007fca30374890, > > > > free space=1014k > > > > Native frames: (J=compiled Java code, A=aot compiled Java > > code, > > > > j=interpreted, Vv=VM code, C=native code) > > > > V [libjvm.so+0xcd3311] > > PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > > > V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > > > RegisterMap > > > > const*, OopClosure*)+0x2eb > > > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > > > CodeBlobClosure*)+0x187 > > > > V [libjvm.so+0xcce2f0] > > > ThreadRootsMarkingTask::do_it(GCTaskManager*, > > > > unsigned int)+0xb0 > > > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > > > > > JavaThread 0x00007fb85c004800 (nid = 111387) was being > > processed > > > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM > code) > > > > v ~RuntimeStub::_new_array_Java > > > > J 225122 c2 > > > > > > > > > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > > (207 bytes) @ 0x00007fca21f1a5d8 > > > [0x00007fca21f17f20+0x00000000000026b8] > > > > J 62342 c2 > > > webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > > > bytes) @ 0x00007fca20f0aec8 > > [0x00007fca20f07f40+0x0000000000002f88] > > > > J 225129 c2 > > > > > > > > > > webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > > (105 bytes) @ 0x00007fca1da512ac > > > [0x00007fca1da51100+0x00000000000001ac] > > > > J 131643 c2 > > > > > > > > > > webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > > (9 bytes) @ 0x00007fca20ce6190 > > > [0x00007fca20ce60c0+0x00000000000000d0] > > > > J 55114 c2 > > > > > > > > > > webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > > (332 bytes) @ 0x00007fca2051fe64 > > > [0x00007fca2051f820+0x0000000000000644] > > > > J 57859 c2 > > > webservice.filters.ResponseSerializationWorker.execute()Z (272 > > > > bytes) @ 0x00007fca1ef2ed18 > > [0x00007fca1ef2e140+0x0000000000000bd8] > > > > J 16114% c2 > > > > > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > > (486 bytes) @ 0x00007fca1ced465c > > > [0x00007fca1ced4200+0x000000000000045c] > > > > j > > > > > > > > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > > > J 11639 c2 java.util.concurrent.FutureTask.run()V > > > java.base at 13.0.1 (123 > > > > bytes) @ 0x00007fca1cd00858 > > [0x00007fca1cd007c0+0x0000000000000098] > > > > J 7560 c1 > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > > > > [0x00007fca15b23160+0x0000000000000df4] > > > > J 5143 c1 > > java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > > > > [0x00007fca15b39a40+0x000000000000007c] > > > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 > > bytes) @ > > > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134] > > > > v ~StubRoutines::call_stub > > > > > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), > > si_addr: > > > > 0x0000000000000000 > > > > > > > > Register to memory mapping: > > > > ... > > > > > > > > Can someone shed more info on when this can happen? I am > > seeing > > > this on > > > > multiple servers with Java 13.0.1+9 on RHEL6 servers. > > > > > > > > There was another thread in hotspot runtime where David > Holmes > > > pointed this > > > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 > > (SI_KERNEL), si_addr: > > > > 0x0000000000000000 > > > > > > > >> This seems it may be related to: > > > >> https://bugs.openjdk.java.net/browse/JDK-8004124 > > > > > > > > Just wondering if this is same or something to do with GC > > specific. > > > > > > > > > > > > > > > > TIA > > > > Sundar > > > > > > > > > > > From stefan.karlsson at oracle.com Fri Feb 28 19:18:15 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 28 Feb 2020 20:18:15 +0100 Subject: Parallel GC Thread crash In-Reply-To: References: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com> <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com> Message-ID: <90c436e6-c678-7e12-a03f-1ad835c6d667@oracle.com> Hi Sundar, On 2020-02-28 19:57, Sundara Mohan M wrote: > Hi?Stefan, > ? ? I tried running with?-XX:+VerifyBeforeGC and -XX:+VerifyAfterGC > but it seems some of the operations are timing out (ex. ssl connect > not sure if i have very low timeout or this flag increases the latency) The flag increase the latencies, because it runs extra verification checks in the pauses. > > Also from the crash error log i see following > ... > ? 0x00007f0588066000 GCTaskThread "ParGC Thread#20" [stack: > 0x00007ef5c4d13000,0x00007ef5c4e13000] [id=87534] > *=>0x00007f0588068000 GCTaskThread "ParGC Thread#21" [stack: > 0x00007ef5c4c12000,0x00007ef5c4d12000] [id=87535] > *? 0x00007f0588069800 GCTaskThread "ParGC Thread#22" [stack: > 0x00007ef5c4b11000,0x00007ef5c4c11000] [id=87536] > ... > Threads with active compile tasks: > > *VM state:at safepoint (normal execution) > * > > VM Mutex/Monitor currently owned by a thread: ?([mutex/lock_event]) > [0x00007f0588015750] Threads_lock - owner thread: 0x00007f0588112800 > [0x00007f0588016350] Heap_lock - owner thread: 0x00007ef4100f9000 > > Does this mean VM is in safepoint and executing GC operation or JVM > related activity which requires to be in safepoint (ie. not executing > user code) when it crashed? Yes, exactly. The Parallel GC does all work in a stop-the-world pause. StefanK > I am trying to see if library or application code is causing this > behaviour. > > Thanks > Sundar > > On Mon, Feb 10, 2020 at 3:13 PM Stefan Karlsson > > wrote: > > On 2020-02-10 20:53, Sundara Mohan M wrote: > > Hi Stefan, > > ? ? Yes we are trying to move to 13.0.2. Wanted to verify if anyone > > else seen this or upgrading will really solve this?problem. > > > > Can you share how to file a bug report for this? I don't have > access > > to https://bugs.openjdk.java.net/ > > There are directions in the hs_err crash file that points you to > the web > page to file a bug. > > You seem to be running AdoptJDK builds so your bug reports would > end up > at their system: > ?>? ? ?> # If you would like to submit a bug report, please visit: > ?>? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues > > > > If you were running with Oracle binaries you would get lines like > this: > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > > > > > > I will try to run with -XX:+VerifyBeforeGC and > -XX:+VerifyAfterGC to > > get more information. > > OK. Hopefully this gives us more information. > > StefanK > > > > > > Thanks > > Sundar > > > > On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson > > > >> wrote: > > > >? ? ?Hi Sundar, > > > >? ? ?On 2020-02-10 19:32, Sundara Mohan M wrote: > >? ? ?> Hi?Stefan, > >? ? ?> ? ? We started seeing more crashes on JDK13.0.1+9 > >? ? ?> > >? ? ?> Since seeing it on GC Task Thread assumed it is related to GC. > > > >? ? ?As I said in my previous mail, I don't think this is caused > by GC > >? ? ?code. > >? ? ?More below. > > > >? ? ?> > >? ? ?> # Problematic frame: > >? ? ?> # V ?[libjvm.so+0xd183c0] > >? ? ??PSRootsClosure::do_oop(oopDesc**)+0x30 > >? ? ?> > >? ? ?> Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m > >? ? ?> -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc > >? ? ?> -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh > >? ? ?> reads=5 ... > >? ? ?> > >? ? ?> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, > 125G, > >? ? ?Red > >? ? ?> Hat Enterprise Linux Server release 6.10 (Santiago) > >? ? ?> Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290 > seconds > >? ? ?(3d 7h > >? ? ?> 31m 30s) > >? ? ?> > >? ? ?> --------------- ?T H R E A D ?--------------- > >? ? ?> > >? ? ?> Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC > >? ? ?Thread#28" > >? ? ?> [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530] > >? ? ?> > >? ? ?> Stack: [0x00007fba72ff1000,0x00007fba730f1000], > >? ? ?> ?sp=0x00007fba730ee850, ?free space=1014k > >? ? ?> Native frames: (J=compiled Java code, A=aot compiled Java > code, > >? ? ?> j=interpreted, Vv=VM code, C=native code) > >? ? ?> V ?[libjvm.so+0xd183c0] > >? ? ??PSRootsClosure::do_oop(oopDesc**)+0x30 > >? ? ?> V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*, > >? ? ?RegisterMap > >? ? ?> const*, OopClosure*)+0x2eb > >? ? ?> V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*, > >? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99 > >? ? ?> V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*, > >? ? ?> CodeBlobClosure*)+0x187 > >? ? ?> V ?[libjvm.so+0xd190be] > ?ThreadRootsTask::do_it(GCTaskManager*, > >? ? ?> unsigned int)+0x6e > >? ? ?> V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb > >? ? ?> V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d > >? ? ?> V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7 > >? ? ?> > >? ? ?> JavaThread 0x00007fb8f4036800 (nid = 60927) was being > processed > >? ? ?> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > >? ? ?> v ?~RuntimeStub::_new_array_Java > >? ? ?> J 58520 c2 > >? ? ?> > > > ?ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > > >? ? ?> (207 bytes) @ 0x00007fca5fd23dec > >? ? ?[0x00007fca5fd1dbc0+0x000000000000622c] > >? ? ?> J 66864 c2 > webservice.exception.ExceptionLoggingWrapper.execute()V > >? ? ?> (1004 bytes) @ 0x00007fca60c02588 > >? ? ?[0x00007fca60bffce0+0x00000000000028a8] > >? ? ?> J 58224 c2 > >? ? ?> > > > ?webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > >? ? ?> (105 bytes) @ 0x00007fca5f59bad8 > >? ? ?[0x00007fca5f59b880+0x0000000000000258] > >? ? ?> J 69992 c2 > >? ? ?> > > > ?webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > > >? ? ?> (9 bytes) @ 0x00007fca5e1019f4 > >? ? ?[0x00007fca5e101940+0x00000000000000b4] > >? ? ?> J 55265 c2 > >? ? ?> > > > ?webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > > >? ? ?> (332 bytes) @ 0x00007fca5f6f58e0 > >? ? ?[0x00007fca5f6f5700+0x00000000000001e0] > >? ? ?> J 483122 c2 > > ?webservice.filters.ResponseSerializationWorker.execute()Z > >? ? ?> (272 bytes) @ 0x00007fca622fc2b4 > >? ? ?[0x00007fca622fbc80+0x0000000000000634] > >? ? ?> J 15811% c2 > >? ? ?> > > > ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > > >? ? ?> (486 bytes) @ 0x00007fca5c108794 > >? ? ?[0x00007fca5c1082a0+0x00000000000004f4] > >? ? ?> j > >? ? ?> > > > ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > >? ? ?> J 4586 c1 java.util.concurrent.FutureTask.run()V > >? ? ?java.base at 13.0.1 (123 > >? ? ?> bytes) @ 0x00007fca54d27184 > [0x00007fca54d26b00+0x0000000000000684] > >? ? ?> J 7550 c1 > >? ? ?> > > > ?java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > > >? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 > >? ? ?> [0x00007fca54fba8e0+0x0000000000000df4] > >? ? ?> J 7549 c1 > java.util.concurrent.ThreadPoolExecutor$Worker.run()V > >? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c > >? ? ?> [0x00007fca5454b8c0+0x000000000000007c] > >? ? ?> J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 > bytes) @ > >? ? ?> 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134] > >? ? ?> v ?~StubRoutines::call_stub > >? ? ?> > >? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), > si_addr: > >? ? ?> 0x0000000000000000 > >? ? ?> > >? ? ?> Does JDK11 and 13 have different code for GC. Do you think > >? ? ?> downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help > here? > > > >? ? ?You should at least move to 13.0.2, to get the latest bug > >? ? ?fixes/patches. > > > >? ? ?There has been a lot of changes in all areas of the JVM > between 11 > >? ? ?and > >? ? ?13. We don't yet know the root cause of this crash, and I can't > >? ? ?say if > >? ? ?this is caused by new changes or not. Have you or anyone > filed a bug > >? ? ?report for this? > > > >? ? ?> Any insight to debug this will be helpful. > > > >? ? ?Did you try my previous suggestion to run with > -XX:+VerifyBeforeGC > >? ? ?and > >? ? ?-XX:+VerifyAfterGC? If you can tolerate the longer GC times it > >? ? ?introduces, then you could try to run with > >? ? ?-XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC > >? ? ?-XX:+VerifyAfterGC . > > > >? ? ?Cheers, > >? ? ?StefanK > > > >? ? ?> > >? ? ?> TIA > >? ? ?> Sundar > >? ? ?> > >? ? ?> On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson > >? ? ?> > > > >? ? ? > >? ? ? >>> wrote: > >? ? ?> > >? ? ?>? ? ?Hi Sundar, > >? ? ?> > >? ? ?>? ? ?The GC crashes when it encounters something bad on the > stack: > >? ? ?>? ? ??> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame > const*, > >? ? ?>? ? ?RegisterMap > >? ? ?>? ? ??> const*, OopClosure*)+0x2eb > >? ? ?>? ? ??> V? [libjvm.so+0x765489] > frame::oops_do_internal(OopClosure*, > >? ? ?> > >? ? ?>? ? ?This is probably not a GC bug. It's more likely that > this is > >? ? ?>? ? ?caused by > >? ? ?>? ? ?the JIT compiler. I see in your hotspot-runtime-dev > thread, > >? ? ?that you > >? ? ?>? ? ?also get crashes in other compiler related areas. > >? ? ?> > >? ? ?>? ? ?If you want to rule out the GC, you can run with > >? ? ?>? ? ?-XX:+VerifyBeforeGC and > >? ? ?>? ? ?-XX:+VerifyAfterGC, and see if this asserts before the GC > >? ? ?has started > >? ? ?>? ? ?running. > >? ? ?> > >? ? ?>? ? ?StefanK > >? ? ?> > >? ? ?>? ? ?On 2020-02-04 04:38, Sundara Mohan M wrote: > >? ? ?>? ? ?> Hi, > >? ? ?>? ? ?>? ? ?I am seeing following crashes frequently on our > servers > >? ? ?>? ? ?> # > >? ? ?>? ? ?> # A fatal error has been detected by the Java Runtime > >? ? ?Environment: > >? ? ?>? ? ?> # > >? ? ?>? ? ?> #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, > >? ? ?tid=108299 > >? ? ?>? ? ?> # > >? ? ?>? ? ?> # JRE version: OpenJDK Runtime Environment > (13.0.1+9) (build > >? ? ?>? ? ?13.0.1+9) > >? ? ?>? ? ?> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed > mode, > >? ? ?>? ? ?tiered, parallel > >? ? ?>? ? ?> gc, linux-amd64) > >? ? ?>? ? ?> # Problematic frame: > >? ? ?>? ? ?> # V? [libjvm.so+0xcd3311] > >? ? ?>? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > >? ? ?>? ? ?> # > >? ? ?>? ? ?> # No core dump will be written. Core dumps have been > disabled. > >? ? ?>? ? ?To enable > >? ? ?>? ? ?> core dumping, try "ulimit -c unlimited" before starting > >? ? ?Java again > >? ? ?>? ? ?> # > >? ? ?>? ? ?> # If you would like to submit a bug report, please > visit: > >? ? ?>? ? ?> # > https://github.com/AdoptOpenJDK/openjdk-build/issues > > >? ? ?>? ? ?> # > >? ? ?>? ? ?> > >? ? ?>? ? ?> > >? ? ?>? ? ?> ---------------? T H R E A D --------------- > >? ? ?>? ? ?> > >? ? ?>? ? ?> Current thread (0x00007fca2c051000): GCTaskThread "ParGC > >? ? ?>? ? ?Thread#8" [stack: > >? ? ?>? ? ?> 0x00007fca30277000,0x00007fca30377000] [id=108299] > >? ? ?>? ? ?> > >? ? ?>? ? ?> Stack: [0x00007fca30277000,0x00007fca30377000], > >? ? ?>? ? ?sp=0x00007fca30374890, > >? ? ?>? ? ?>? ?free space=1014k > >? ? ?>? ? ?> Native frames: (J=compiled Java code, A=aot compiled > Java > >? ? ?code, > >? ? ?>? ? ?> j=interpreted, Vv=VM code, C=native code) > >? ? ?>? ? ?> V? [libjvm.so+0xcd3311] > >? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > >? ? ?>? ? ?> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, > >? ? ?>? ? ?RegisterMap > >? ? ?>? ? ?> const*, OopClosure*)+0x2eb > >? ? ?>? ? ?> V? [libjvm.so+0x765489] > frame::oops_do_internal(OopClosure*, > >? ? ?>? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99 > >? ? ?>? ? ?> V? [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > >? ? ?>? ? ?> CodeBlobClosure*)+0x187 > >? ? ?>? ? ?> V? [libjvm.so+0xcce2f0] > >? ? ?> ?ThreadRootsMarkingTask::do_it(GCTaskManager*, > >? ? ?>? ? ?> unsigned int)+0xb0 > >? ? ?>? ? ?> V? [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > >? ? ?>? ? ?> V? [libjvm.so+0xf707fd] Thread::call_run()+0x10d > >? ? ?>? ? ?> V? [libjvm.so+0xc875b7] > thread_native_entry(Thread*)+0xe7 > >? ? ?>? ? ?> > >? ? ?>? ? ?> JavaThread 0x00007fb85c004800 (nid = 111387) was being > >? ? ?processed > >? ? ?>? ? ?> Java frames: (J=compiled Java code, j=interpreted, > Vv=VM code) > >? ? ?>? ? ?> v? ~RuntimeStub::_new_array_Java > >? ? ?>? ? ?> J 225122 c2 > >? ? ?>? ? ?> > >? ? ?> > > > ??ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > >? ? ?>? ? ?> (207 bytes) @ 0x00007fca21f1a5d8 > >? ? ?>? ? ?[0x00007fca21f17f20+0x00000000000026b8] > >? ? ?>? ? ?> J 62342 c2 > >? ? ?> ?webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > >? ? ?>? ? ?> bytes) @ 0x00007fca20f0aec8 > >? ? ?[0x00007fca20f07f40+0x0000000000002f88] > >? ? ?>? ? ?> J 225129 c2 > >? ? ?>? ? ?> > >? ? ?> > > > ??webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > >? ? ?>? ? ?> (105 bytes) @ 0x00007fca1da512ac > >? ? ?>? ? ?[0x00007fca1da51100+0x00000000000001ac] > >? ? ?>? ? ?> J 131643 c2 > >? ? ?>? ? ?> > >? ? ?> > > > ??webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > >? ? ?>? ? ?> (9 bytes) @ 0x00007fca20ce6190 > >? ? ?>? ? ?[0x00007fca20ce60c0+0x00000000000000d0] > >? ? ?>? ? ?> J 55114 c2 > >? ? ?>? ? ?> > >? ? ?> > > > ??webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > >? ? ?>? ? ?> (332 bytes) @ 0x00007fca2051fe64 > >? ? ?>? ? ?[0x00007fca2051f820+0x0000000000000644] > >? ? ?>? ? ?> J 57859 c2 > >? ? ?> ?webservice.filters.ResponseSerializationWorker.execute()Z > (272 > >? ? ?>? ? ?> bytes) @ 0x00007fca1ef2ed18 > >? ? ?[0x00007fca1ef2e140+0x0000000000000bd8] > >? ? ?>? ? ?> J 16114% c2 > >? ? ?>? ? ?> > >? ? ?> > > > ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > >? ? ?>? ? ?> (486 bytes) @ 0x00007fca1ced465c > >? ? ?>? ? ?[0x00007fca1ced4200+0x000000000000045c] > >? ? ?>? ? ?> j > >? ? ?>? ? ?> > >? ? ?> > > > ???com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > >? ? ?>? ? ?> J 11639 c2 java.util.concurrent.FutureTask.run()V > >? ? ?>? ? ?java.base at 13.0.1 (123 > >? ? ?>? ? ?> bytes) @ 0x00007fca1cd00858 > >? ? ?[0x00007fca1cd007c0+0x0000000000000098] > >? ? ?>? ? ?> J 7560 c1 > >? ? ?>? ? ?> > >? ? ?> > > > ??java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > >? ? ?>? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54 > >? ? ?>? ? ?> [0x00007fca15b23160+0x0000000000000df4] > >? ? ?>? ? ?> J 5143 c1 > >? ? ?java.util.concurrent.ThreadPoolExecutor$Worker.run()V > >? ? ?>? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc > >? ? ?>? ? ?> [0x00007fca15b39a40+0x000000000000007c] > >? ? ?>? ? ?> J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 > >? ? ?bytes) @ > >? ? ?>? ? ?> 0x00007fca159fc174 > [0x00007fca159fc040+0x0000000000000134] > >? ? ?>? ? ?> v? ~StubRoutines::call_stub > >? ? ?>? ? ?> > >? ? ?>? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 > (SI_KERNEL), > >? ? ?si_addr: > >? ? ?>? ? ?> 0x0000000000000000 > >? ? ?>? ? ?> > >? ? ?>? ? ?> Register to memory mapping: > >? ? ?>? ? ?> ... > >? ? ?>? ? ?> > >? ? ?>? ? ?> Can someone shed more info on when this can happen? I am > >? ? ?seeing > >? ? ?>? ? ?this on > >? ? ?>? ? ?> multiple servers with Java 13.0.1+9 on RHEL6 servers. > >? ? ?>? ? ?> > >? ? ?>? ? ?> There was another thread in hotspot runtime where > David Holmes > >? ? ?>? ? ?pointed this > >? ? ?>? ? ?>> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 > >? ? ?(SI_KERNEL), si_addr: > >? ? ?>? ? ?> 0x0000000000000000 > >? ? ?>? ? ?> > >? ? ?>? ? ?>> This seems it may be related to: > >? ? ?>? ? ?>> https://bugs.openjdk.java.net/browse/JDK-8004124 > >? ? ?>? ? ?> > >? ? ?>? ? ?> Just wondering if this is same or something to do > with GC > >? ? ?specific. > >? ? ?>? ? ?> > >? ? ?>? ? ?> > >? ? ?>? ? ?> > >? ? ?>? ? ?> TIA > >? ? ?>? ? ?> Sundar > >? ? ?>? ? ?> > >? ? ?> > > > From leo.korinth at oracle.com Fri Feb 28 19:24:28 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Fri, 28 Feb 2020 20:24:28 +0100 Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test In-Reply-To: References: Message-ID: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com> On 21/02/2020 20:48, Leonid Mesnik wrote: > Hi > > Could you please review following fix which removes parOld test. Test > checks that ParOldGC is used if no GC is selected and new gen GC is > PSYoungGen. Test is obsolete now and should be removed. > > webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/ Looks good to me (I am not a reviewer). Thanks for cleaning up! /Leo > bug: https://bugs.openjdk.java.net/browse/JDK-8203239 > From shade at redhat.com Fri Feb 28 19:36:24 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Feb 2020 20:36:24 +0100 Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test In-Reply-To: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com> References: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com> Message-ID: <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com> On 2/28/20 8:24 PM, Leo Korinth wrote: > On 21/02/2020 20:48, Leonid Mesnik wrote: >> Hi >> >> Could you please review following fix which removes parOld test. Test >> checks that ParOldGC is used if no GC is selected and new gen GC is >> PSYoungGen. Test is obsolete now and should be removed. >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/ > > Looks good to me (I am not a reviewer). Looks good. -- Thanks, -Aleksey From kim.barrett at oracle.com Fri Feb 28 21:48:08 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 28 Feb 2020 16:48:08 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager Message-ID: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Please review this change which removes the ConcurrentGCPhaseManager class and replaces it with ConcurrentGCBreakpoints. This is joint work with Per Liden. This change provides a client API, used by WhiteBox. The usage model for a client is (1) Acquire control of concurrent collection cycles. (2) Do work that must be performed while the collection cycle is in a known state. (3) Request the concurrent collector run to a named "breakpoint", or run to completion, and then hold there, waiting for further commands. (4) Optionally goto (2). (5) Release control of concurrent collection cycles. Tests have been updated to use the new WhiteBox API. This change provides implementations of the new mechanism for G1 and ZGC. A Shenandoah implementation is being left to others, but we don't see any obvious reason for it to be difficult. CR: https://bugs.openjdk.java.net/browse/JDK-8240239 Webrev: https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ To possibly simplify the review, the open patch is also provided as a pair of patches, one for removing the old mechanism and a second to add the new mechanism. https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ Removes ConcurrentGCPhaseManager and its G1 implementation, except that tests are not modifed. https://cr.openjdk.java.net/~kbarrett/8240239/control.03/ Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and updates tests to use it. Testing: mach5 tier1-5, which includes all the updated and new tests. From leonid.mesnik at oracle.com Fri Feb 28 23:56:17 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Fri, 28 Feb 2020 15:56:17 -0800 Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test In-Reply-To: <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com> References: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com> <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com> Message-ID: <70b6a9c4-42a7-0cf4-cd34-2c597df67eff@oracle.com> Aleksey, Leo Thank you for review. Leonid On 2/28/20 11:36 AM, Aleksey Shipilev wrote: > On 2/28/20 8:24 PM, Leo Korinth wrote: >> On 21/02/2020 20:48, Leonid Mesnik wrote: >>> Hi >>> >>> Could you please review following fix which removes parOld test. Test >>> checks that ParOldGC is used if no GC is selected and new gen GC is >>> PSYoungGen. Test is obsolete now and should be removed. >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/ >> Looks good to me (I am not a reviewer). > Looks good. >