From maoliang.ml at alibaba-inc.com Mon Mar 2 12:32:31 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 02 Mar 2020 20:32:31 +0800 Subject: =?UTF-8?B?UkZSKE0pOiA4MjM2OTI2OiBDb25jdXJyZW50bHkgdW5jb21taXQgbWVtb3J5IGluIEcx?= Message-ID: <0839e8e9-4de4-43c0-bf1b-df357b3c7771.maoliang.ml@alibaba-inc.com> Hi Thomas/Stefan and other developers, I have created the separate patch for 8236926. The concurrent work has been moved to G1YoungRemSetSamplingThread according to previous comment. Specjvm2008 worked fine with the patch(specjbb2015 doesn't have the scenario of heap shrink). http://cr.openjdk.java.net/~luchsh/8236926.webrev/ Thanks, Liang From linzang at tencent.com Mon Mar 2 13:56:52 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Mon, 2 Mar 2020 13:56:52 +0000 Subject: JDK-8215624 add parallel heap inspection support for jmap histo(G1)(Internet mail) In-Reply-To: References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>

Message-ID: <2EDF28BF-94D5-4F2E-B96E-2C45948AD454@tencent.com> Dear all, Let me try to ease the reviewing work by some explanation :P The patch's target is to speed up jmap -histo for heap iteration, from my experience it is necessary for large heap investigation. E.g in bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does take quite a while. And if my understanding is corrent, even the jmap -histo without "live" option does heap inspection with heap lock acquired. so it is very likely to block mutator thread in allocation-sensitive scenario. I would say the faster the heap inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is necessary. I think the parallel heap inspection should be applied to all kind of heap. However, consider the heap layout are different for GCs, much time is required to understand all kinds of the heap layout to make the whole change. IMO, It is not wise to have a huge patch for the whole solution at once, and it is even harder to review it. So I plan to implement it incrementally, the first patch (this one) is going to confirm the implemention detail of how jmap accept the new option, passes it to attachListener of the jvm process and then how to make the parallel inspection closure be generic enough to make it easy to extend to different heap layout. And also how to implement the heap inspection in specific gc's heap. This patch use G1's heap as the begining. This patch actually do several things: 1. Add an option "parallelThreadNum=" to jmap -histo, the default behavior is to set N to 0, means let's JVM decide how many threads to use for heap inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: https://bugs.openjdk.java.net/browse/JDK-8239290) 2. Make a change in how Jmap passing arguments, changes in http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html, originally it pass options as separate arguments to attachListener, this patch change to that all options be compose to a single string. So the arg_count_max in attachListener.hpp do not need to be changed, and hence avoid the compatibility issue, as disscussed at https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html 3. Add an abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares the data structure (KlassInfoTable) need for every parallel worker thread, and then call do_object_iterate_parallel() which is heap specific implementation. I also added some machenism in KlassInfoTable to support parallel iteration, such as merge(). 4. In specific heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel(). 5. Add related test. 6. it may be easy to extend this patch for other kinds of heap by creating subclass of ParHeapInspectTask and implement the do_object_iterate_parallel(). Hope these info could help on code review and initate the discussion :-) Thanks! BRs, Lin ?>On 2020/2/19, 9:40 AM, "linzang(??)" wrote:. > > Re-post this RFR with correct enhancement number to make it trackable. > please ignore the previous wrong post. sorry for troubles. > > webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/ > Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > -------------- > Lin > >Hi Lin, > > > >Could you, please, re-post your RFR with the right enhancement number in > >the message subject? > >It will be more trackable this way. > > > >Thanks, > >Serguei > > > > > >On 2/17/20 10:29 PM, linzang(??) wrote: > >> Dear David, > >> Thanks a lot! > >> I have updated the refined code to http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. > >> IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. > >> Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap, then we can extend the solution to other kinds of heap. > >> > >> Thanks, > >> -------------- > >> Lin > >>> Hi Lin, > >>> > >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC > >>> worker threads, and whether it needs to be extended beyond G1. > >>> > >>> I happened to spot one nit when browsing: > >>> > >>> src/hotspot/share/gc/shared/collectedHeap.hpp > >>> > >>> + virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, > >>> + BoolObjectClosure* filter, > >>> + size_t* missed_count, > >>> + size_t thread_num) { > >>> + return NULL; > >>> > >>> s/NULL/false/ > >>> > >>> Cheers, > >>> David > >>> > >>> On 18/02/2020 2:15 pm, linzang(??) wrote: > >>>> Dear All, > >>>> May I ask your help to review the follow changes: > >>>> webrev: > >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ > >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > >>>> related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > >>>> This patch enable parallel heap inspection of G1 for jmap histo. > >>>> my simple test shown it can speed up 2x of jmap -histo with > >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. > >>>> > >>>> ------------------------------------------------------------------------ > >>>> BRs, > >>>> Lin > >> > > > From maoliang.ml at alibaba-inc.com Tue Mar 3 11:14:04 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 03 Mar 2020 19:14:04 +0800 Subject: =?UTF-8?B?RzE6IEFib3J0IGNvbmN1cnJlbnQgYXQgaW5pdGlhbCBtYXJrIHBhdXNl?= Message-ID: Hi All, As previous discusion, there're several ideas to improve the humongous objects handling. We've made some experiments that canceling concurrent mark at initial mark pause is proved to be effective in the senario that frequent temporary humongous objects allocation leads to frequent concurrent mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is also the exact case but not GC sensative so there's little difference in score. The patch is small and shall we have a bug id for it? http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 21 (Tue.) 18:20 To:"MAO, Liang" ; Man Cao ; hotspot-gc-dev Subject:Re: Discussion: improve humongous objects handling for G1 Hi, On 21.01.20 07:25, Liang Mao wrote: > Hi Thomas, > > In fact we saw this issue with 8u. One issue I forgot to tell is that when > CPU usage is quite high which is nearly 100% the concurrent mark will > get very slow so the to-space exhuasted happened. BTW, is there any > improvements for this point in JDK11 or higher versions? I didn't notice so far. JDK13 has some implicit increases in the thresholds to take more humongous candidate regions. Not a lot though. > Increasing reserve percent could alleviate the problem but seems not a completed > solution. It would be nicer if g1 automatically adjusted this reserve based on actual allocation of course. ;) Which is another option btw - there are many ways to avoid the evacuation failure situation. > Cancelling concurrent mark cycle in initial-mark pause seems a delicate > optimization which can cover some issues if a lot of humongous regions have been > reclaimed in this pause. It can avoid the unnecessary cm cycle and also trigger cm > earlier if neened. > We will take this into the consideration. Thanks for the great idea:) > > If there is a short-live humongous object array which also references other > short-live objects the situation could be worse. If we increase the > G1HeapRegionSize, some humongous objects become normal objects and the behavior > is more like CMS then everything goes fine. I don't think we have to not allow humongous > objects to behave as normal ones. A new allocated humongous object array can probably > reference objects in young generation and scanning the object array by remset > couldn't be better than directly iterating the array in evacuation because of possible > prefetch. We can have an alternative max survivor age for humongous object, maybe 5 or 8 If I read this paragraph correctly you argue that keeping a large humongous objArray in young is okay because a) if you increase the heap region size, it has a high chance that it would be below the thresholds anyway, so you would scan it anyway b) scanning a humongous objArray with a few references is not much different performance wise than targeted scanning of the corresponding cards in the remembered set because of hardware. Regarding a) Since I have yet to see logs, I can't tell what the typical size of these arrays are (and I have not seen a "typical" humongous object distribution graph for these applications). However regions sizes are kind of proportional with heap size which kind of corresponds to the hardware that you need to use. I.e. you likely won't see G1 using 100 threads on 200m heap with 32m regions with current ergonomics. Even then this limits objArrays to 16M (at 32m region size), which limits the time spent scanning the object (and if ergonomics select 32m regions, the heap and the machine are probably quite big anyway). From what you and Man were telling, you seem to have a significant amount of humongous objects of unknown type that are much(?) larger than that. Regarding b) that has been wrong years ago when I did experiments on that (even the "limit age on humongous obj arrays" workaround - you can easily go as low as a max tenuring threshold of 1 to catch almost all of the relevant ones), and very likely still is. Let me do some over-the-thumb calculations: Assuming that we have 32M objects (random number, i.e. ~8m references), with, say 1k references (which is more than a handful), the remembered set would make you scan only 1.5% max (1000*512 bytes/card) of the object. I seriously doubt that prefetching or some magic hardware will make that amount additional work disappear. From a performance POV, with 20 GB/s bandwidth available, (which I am not sure you will reach during GC for whatever reasons; random number), you are spending 1.5ms (if I calculated correctly) cpu time just for finding out that the 32M object is completely full of null-s in the worst case. That's also the minimum amount of time you need per such object. Keeping it outside of young gen, and particularly if it has been allocated just recently it won't have a lot remembered set entries, would likely be much cheaper than that (as mentioned, G1 has a good measure of how long scanning a card will take so we could take this number). Only if G1 is going to scan it almost completely anyway (which we agree on is unlikely to be the case as it has "just" been allocated), then keeping it outside is disadvantagous. Note that its allocation could still be counted against the eden allowance in some situations. This could be seen as a way to slow down the mutator while it is busy trying to complete the marking. I am however not sure if it helps a lot assuming that changes to perform eager reclaim on objArrays won't work during marking btw. There would be need for a different kind of enforcing such an allocation penalty. Without more thinking and measurements I would not know when and how to account that, and what has to happen with existing mechanisms to absorb allocation spikes (i.e. G1ReservePercent). I just assume that you probably do not want both. Also something to consider. > at most otherwise let eager reclam do it. A tradeoff can be made to balance the > pause time and reclamation possibility of short-live objects. > > So the enhanced solution can be > 1. Cancelling concurrent mark if not necessary. > 2. Increase the reclamation possibility of short-live humongous objects. These are valid possibilities to improve the overall situation without fixing actual fragmentation issues ;) > An important reason for this issue is that Java developers easily > challenge CMS can handle the application without significant CPU usage increase > (caused by concurrent mark) > but why G1 cannot. Personally I believe G1 can do anything not worse > than CMS:) > This proposal aims for the throughput gap comparing to CMS. If works > with the barrier optimization which is proposed by Man and Google, imho the gap could be > obviously reduced. Thanks, Thomas From per.liden at oracle.com Tue Mar 3 13:21:08 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 3 Mar 2020 14:21:08 +0100 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> Hi Kim, On 2/28/20 10:48 PM, Kim Barrett wrote: > Please review this change which removes the ConcurrentGCPhaseManager > class and replaces it with ConcurrentGCBreakpoints. > > This is joint work with Per Liden. > > This change provides a client API, used by WhiteBox. The usage model > for a client is > > (1) Acquire control of concurrent collection cycles. > > (2) Do work that must be performed while the collection cycle is in a > known state. > > (3) Request the concurrent collector run to a named "breakpoint", or > run to completion, and then hold there, waiting for further commands. > > (4) Optionally goto (2). > > (5) Release control of concurrent collection cycles. > > Tests have been updated to use the new WhiteBox API. > > This change provides implementations of the new mechanism for G1 and > ZGC. A Shenandoah implementation is being left to others, but we > don't see any obvious reason for it to be difficult. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240239 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ This looks good to me. However, it would be good if someone else had a closer look at the G1 changes, as I'm feeling less confident reviewing that part. cheers, Per > > To possibly simplify the review, the open patch is also provided as a > pair of patches, one for removing the old mechanism and a second to > add the new mechanism. > > https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ > Removes ConcurrentGCPhaseManager and its G1 implementation, except > that tests are not modifed. > > https://cr.openjdk.java.net/~kbarrett/8240239/control.03/ > Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and > updates tests to use it. > > Testing: > mach5 tier1-5, which includes all the updated and new tests. > From m.sundar85 at gmail.com Tue Mar 3 16:02:24 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 11:02:24 -0500 Subject: Need help on debugging JVM crash Message-ID: Hi, I am seeing JVM crashes on our system in GC Thread with parallel gc on x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA builds. Adding some logs lines to give some context. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 # # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # https://github.com/AdoptOpenJDK/openjdk-build/issues # Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat Enterprise Linux Server release 6.10 (Santiago) Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m 26s) Following is the stack trace ex1: Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xb0 V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 62465 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] J 474206 c2 org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] j org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 j org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 j org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 j org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] v ~StubRoutines::call_stub ex2: Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xb0 V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf707fd] Thread::call_run()+0x10d V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007f5518004000 (nid = 75659) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] J 334031 c2 com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] J 53431 c2 com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] J 63819 c2 com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] J 334032 c2 com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] J 403918 c2 com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] J 31970% c2 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] j com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] J 7487 c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 [0x00007f667dd44a60+0x0000000000000df4] J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c [0x00007f667d1f63c0+0x000000000000007c] J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] v ~StubRoutines::call_stub Not very frequent but ~90 days ~120 crashes with following signal siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 This signal is generated when we try to access non canonical address in linux. As suggested by Stefan in another thread i tried to add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and applications not surviving our production traffic(timing out and requests are failing). Questions 1. When i looked at source code for printing stack trace i see following https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 (Prints native stack trace) https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 (printing Java thread stack trace if it is involved in GC crash) a. How do you know this java thread was involved in jvm crash? b. Can i assume the java thread printed after native stack trace was the culprit? c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different stack trace in both crashes can this be the root cause? 2. Thinking of excluding compilation of ch.qos.logback.classic.spi.ThrowableProxy class and running in production to see if compilation of this method is the cause. Does it make sense? 3. Any other suggestion on debugging this further? TIA Sundar From yumin.qi at oracle.com Tue Mar 3 16:22:44 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 3 Mar 2020 08:22:44 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: HI, Sundara On 3/3/20 8:02 AM, Sundara Mohan M wrote: > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see following > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? When GC processes thread stack as root, the java thread first was recorded. This is why at crash, the java thread was printed out. > b. Can i assume the java thread printed after native stack trace was the > culprit? Please check this thread stack frames, when GC is doing marking work, I think, it encountered a bad oop. Check: If it is a compiled frame, if so, it may related to compiled code. > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? It is a C2 compiled frame. The bad oop could be a result of compiler. It also needs detail debug information to make the conclusion. Thanks Yumin > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar From m.sundar85 at gmail.com Tue Mar 3 17:39:05 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 12:39:05 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References:

Message-ID: Hi Yumin, On Tue, Mar 3, 2020 at 11:23 AM Yumin Qi wrote: > HI, Sundara > On 3/3/20 8:02 AM, Sundara Mohan M wrote: > > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)Vjava.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()Vjava.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see followinghttps://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace)https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? > > When GC processes thread stack as root, the java thread first was > recorded. This is why at crash, the java thread was printed out. > > b. Can i assume the java thread printed after native stack trace was the > culprit? > > Please check this thread stack frames, when GC is doing marking work, I > think, it encountered a bad oop. Check: > > If it is a compiled frame, if so, it may related to compiled code. > > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? > > It is a C2 compiled frame. The bad oop could be a result of compiler. > Actually the top two frame are always same in different crashes v ~RuntimeStub::_new_array_Java J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) might be issue? Is there any way to identify that and what kind of debug flags/settings might give us this information? > It also needs detail debug information to make the conclusion. > Do you think any of the information dumped in hs_err* file might give us more info (like registers content/Instructions/core file)? Can you please let me know what additional details might help to make the conclusion? Also how to get those information? Thanks > > Yumin > > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar > > Thanks Sundar From aph at redhat.com Tue Mar 3 18:02:59 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 3 Mar 2020 18:02:59 +0000 Subject: Need help on debugging JVM crash In-Reply-To: References:

Message-ID: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> Questions >> 1. When i looked at source code for printing stack trace i see followinghttps://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> (Prints native stack trace)https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> (printing Java thread stack trace if it is involved in GC crash) >> a. How do you know this java thread was involved in jvm crash? The top thread -- the first in the file -- is the one that crashed. >> When GC processes thread stack as root, the java thread first was >> recorded. This is why at crash, the java thread was printed out. >> >> b. Can i assume the java thread printed after native stack trace was the >> culprit? Certainly not. >> Please check this thread stack frames, when GC is doing marking work, I >> think, it encountered a bad oop. Check: >> >> If it is a compiled frame, if so, it may related to compiled code. >> >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different >> stack trace in both crashes can this be the root cause? >> >> It is a C2 compiled frame. The bad oop could be a result of compiler. >> > Actually the top two frame are always same in different crashes > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) > might be issue? Probably not. My money would be on a bad library using Unsafe to do something unwise. But there are many other possibilities. > Is there any way to identify that and what kind of debug flags/settings > might give us this information? > >> It also needs detail debug information to make the conclusion. >> > Do you think any of the information dumped in hs_err* file might give us > more info (like registers content/Instructions/core file)? > > Can you please let me know what additional details might help to make the > conclusion? Also how to get those information? Let's see the complete hs_err file. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From m.sundar85 at gmail.com Tue Mar 3 18:13:24 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 13:13:24 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: Waiting for moderator approval to get my hs_err* files sent. Is being held until the list moderator can review it for approval. The reason it is being held: Message body is too big: 1048807 bytes with a limit of 500 KB Thanks Sundar On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M wrote: > Hi Andrew, > Attaching hs_err* from multiple hosts where both java thread top frame > is same. > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: > >> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> >> Questions >> >> 1. When i looked at source code for printing stack trace i see >> followinghttps:// >> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >> (printing Java thread stack trace if it is involved in GC crash) >> >> a. How do you know this java thread was involved in jvm crash? >> >> The top thread -- the first in the file -- is the one that crashed. >> >> >> When GC processes thread stack as root, the java thread first was >> >> recorded. This is why at crash, the java thread was printed out. >> >> >> >> b. Can i assume the java thread printed after native stack trace was >> the >> >> culprit? >> >> Certainly not. >> >> >> Please check this thread stack frames, when GC is doing marking work, I >> >> think, it encountered a bad oop. Check: >> >> >> >> If it is a compiled frame, if so, it may related to compiled code. >> >> >> >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> different >> >> stack trace in both crashes can this be the root cause? >> >> >> >> It is a C2 compiled frame. The bad oop could be a result of compiler. >> >> >> > Actually the top two frame are always same in different crashes >> > v ~RuntimeStub::_new_array_Java >> > J 54174 c2 >> > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >> > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) >> > might be issue? >> >> Probably not. My money would be on a bad library using Unsafe to do >> something unwise. But there are many other possibilities. >> >> > Is there any way to identify that and what kind of debug flags/settings >> > might give us this information? >> > >> >> It also needs detail debug information to make the conclusion. >> >> >> > Do you think any of the information dumped in hs_err* file might give us >> > more info (like registers content/Instructions/core file)? >> > >> > Can you please let me know what additional details might help to make >> the >> > conclusion? Also how to get those information? >> >> Let's see the complete hs_err file. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >> From m.sundar85 at gmail.com Tue Mar 3 18:07:30 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 13:07:30 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: Hi Andrew, Attaching hs_err* from multiple hosts where both java thread top frame is same. Thanks Sundar On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: > On 3/3/20 5:39 PM, Sundara Mohan M wrote: > >> Questions > >> 1. When i looked at source code for printing stack trace i see > followinghttps:// > github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > >> (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > >> (printing Java thread stack trace if it is involved in GC crash) > >> a. How do you know this java thread was involved in jvm crash? > > The top thread -- the first in the file -- is the one that crashed. > > >> When GC processes thread stack as root, the java thread first was > >> recorded. This is why at crash, the java thread was printed out. > >> > >> b. Can i assume the java thread printed after native stack trace was > the > >> culprit? > > Certainly not. > > >> Please check this thread stack frames, when GC is doing marking work, I > >> think, it encountered a bad oop. Check: > >> > >> If it is a compiled frame, if so, it may related to compiled code. > >> > >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but > different > >> stack trace in both crashes can this be the root cause? > >> > >> It is a C2 compiled frame. The bad oop could be a result of compiler. > >> > > Actually the top two frame are always same in different crashes > > v ~RuntimeStub::_new_array_Java > > J 54174 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) > > might be issue? > > Probably not. My money would be on a bad library using Unsafe to do > something unwise. But there are many other possibilities. > > > Is there any way to identify that and what kind of debug flags/settings > > might give us this information? > > > >> It also needs detail debug information to make the conclusion. > >> > > Do you think any of the information dumped in hs_err* file might give us > > more info (like registers content/Instructions/core file)? > > > > Can you please let me know what additional details might help to make the > > conclusion? Also how to get those information? > > Let's see the complete hs_err file. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > From yumin.qi at oracle.com Tue Mar 3 18:49:06 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 3 Mar 2020 10:49:06 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References:

Message-ID: Hi, Sundara As suggested by Stefan in another thread i tried to > >> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and >> applications not surviving our production traffic(timing out and requests >> are failing). >> >> Questions >> 1. When i looked at source code for printing stack trace i see following >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> (printing Java thread stack trace if it is involved in GC crash) >> a. How do you know this java thread was involved in jvm crash? > When GC processes thread stack as root, the java thread first was > recorded. This is why at crash, the java thread was printed out. >> b. Can i assume the java thread printed after native stack trace was the >> culprit? > > Please check this thread stack frames, when GC is doing marking > work, I think, it encountered a bad oop. Check: > > If it is a compiled frame, if so, it may related to compiled code. > >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different >> stack trace in both crashes can this be the root cause? > > It is a C2 compiled frame. The bad oop could be a result of compiler. > > Actually the top two frame are always same in different crashes > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > In this case do you think JVM code(frame 1) or C2 compiler code(frame > 2) might be issue? > Is there any way to identify that and what kind of debug > flags/settings might give us this information? > > It also needs detail debug information to make the conclusion. > > Do you think any of the information dumped in hs_err* file might give > us more info (like registers content/Instructions/core file)? > > Can you please let me know what additional details might help to make > the conclusion? Also how to get those information? > If it is caused by this compiled java method, excluding the java method from compilation is a workaround. You can switch to the java thread (the printed out java thread at crash), compare the failed frame in GC thread to the frame in the java thread so you will know which frame contained bad oop. Also know what is the frame, compiled, interpreter, or native. Yumin > Thanks > > Yumin > >> 2. Thinking of excluding compilation >> of ch.qos.logback.classic.spi.ThrowableProxy class and running in >> production to see if compilation of this method is the cause. Does it make >> sense? >> >> 3. Any other suggestion on debugging this further? >> >> TIA >> Sundar > > > Thanks > Sundar From stefan.karlsson at oracle.com Tue Mar 3 18:57:40 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 3 Mar 2020 19:57:40 +0100 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> I have approved the message, but it isn't arriving. As a workaround, could you try send one hs_err file at a time, and cut the rest of the message? Each hs_err file is < 500 KB, so maybe that will work. StefanK On 2020-03-03 19:13, Sundara Mohan M wrote: > Waiting for moderator approval to get my hs_err* files sent. > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message body is too big: 1048807 bytes with a limit of 500 KB > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M wrote: > >> Hi Andrew, >> Attaching hs_err* from multiple hosts where both java thread top frame >> is same. >> >> Thanks >> Sundar >> >> On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: >> >>> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >>>>> Questions >>>>> 1. When i looked at source code for printing stack trace i see >>> followinghttps:// >>> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >>>>> (Prints native stack trace) >>> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >>>>> (printing Java thread stack trace if it is involved in GC crash) >>>>> a. How do you know this java thread was involved in jvm crash? >>> The top thread -- the first in the file -- is the one that crashed. >>> >>>>> When GC processes thread stack as root, the java thread first was >>>>> recorded. This is why at crash, the java thread was printed out. >>>>> >>>>> b. Can i assume the java thread printed after native stack trace was >>> the >>>>> culprit? >>> Certainly not. >>> >>>>> Please check this thread stack frames, when GC is doing marking work, I >>>>> think, it encountered a bad oop. Check: >>>>> >>>>> If it is a compiled frame, if so, it may related to compiled code. >>>>> >>>>> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >>>>> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >>> different >>>>> stack trace in both crashes can this be the root cause? >>>>> >>>>> It is a C2 compiled frame. The bad oop could be a result of compiler. >>>>> >>>> Actually the top two frame are always same in different crashes >>>> v ~RuntimeStub::_new_array_Java >>>> J 54174 c2 >>>> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >>>> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >>>> In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) >>>> might be issue? >>> Probably not. My money would be on a bad library using Unsafe to do >>> something unwise. But there are many other possibilities. >>> >>>> Is there any way to identify that and what kind of debug flags/settings >>>> might give us this information? >>>> >>>>> It also needs detail debug information to make the conclusion. >>>>> >>>> Do you think any of the information dumped in hs_err* file might give us >>>> more info (like registers content/Instructions/core file)? >>>> >>>> Can you please let me know what additional details might help to make >>> the >>>> conclusion? Also how to get those information? >>> Let's see the complete hs_err file. >>> >>> -- >>> Andrew Haley (he/him) >>> Java Platform Lead Engineer >>> Red Hat UK Ltd. >>> https://keybase.io/andrewhaley >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>> >>> From m.sundar85 at gmail.com Tue Mar 3 19:00:41 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:00:41 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: Another crash file. Thanks Sundar On Tue, Mar 3, 2020 at 2:00 PM Sundara Mohan M wrote: > Attaching 1 file as a work around! > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:57 PM Stefan Karlsson > wrote: > >> I have approved the message, but it isn't arriving. As a workaround, >> could you try send one hs_err file at a time, and cut the rest of the >> message? Each hs_err file is < 500 KB, so maybe that will work. >> >> StefanK >> >> On 2020-03-03 19:13, Sundara Mohan M wrote: >> > Waiting for moderator approval to get my hs_err* files sent. >> > >> > Is being held until the list moderator can review it for approval. >> > >> > The reason it is being held: >> > >> > Message body is too big: 1048807 bytes with a limit of 500 KB >> > >> > Thanks >> > Sundar >> > >> > On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M >> wrote: >> > >> >> Hi Andrew, >> >> Attaching hs_err* from multiple hosts where both java thread top >> frame >> >> is same. >> >> >> >> Thanks >> >> Sundar >> >> >> >> On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: >> >> >> >>> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> >>>>> Questions >> >>>>> 1. When i looked at source code for printing stack trace i see >> >>> followinghttps:// >> >>> >> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >>>>> (Prints native stack trace) >> >>> >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >>>>> (printing Java thread stack trace if it is involved in GC crash) >> >>>>> a. How do you know this java thread was involved in jvm crash? >> >>> The top thread -- the first in the file -- is the one that crashed. >> >>> >> >>>>> When GC processes thread stack as root, the java thread first was >> >>>>> recorded. This is why at crash, the java thread was printed out. >> >>>>> >> >>>>> b. Can i assume the java thread printed after native stack trace >> was >> >>> the >> >>>>> culprit? >> >>> Certainly not. >> >>> >> >>>>> Please check this thread stack frames, when GC is doing marking >> work, I >> >>>>> think, it encountered a bad oop. Check: >> >>>>> >> >>>>> If it is a compiled frame, if so, it may related to compiled code. >> >>>>> >> >>>>> c. Since i am seeing the same frame >> (~RuntimeStub::_new_array_Java, J >> >>>>> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> >>> different >> >>>>> stack trace in both crashes can this be the root cause? >> >>>>> >> >>>>> It is a C2 compiled frame. The bad oop could be a result of >> compiler. >> >>>>> >> >>>> Actually the top two frame are always same in different crashes >> >>>> v ~RuntimeStub::_new_array_Java >> >>>> J 54174 c2 >> >>>> >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> >>>> (207 bytes) @ 0x00007f6687d92678 >> [0x00007f6687d8c700+0x0000000000005f78] >> >>>> In this case do you think JVM code(frame 1) or C2 compiler >> code(frame 2) >> >>>> might be issue? >> >>> Probably not. My money would be on a bad library using Unsafe to do >> >>> something unwise. But there are many other possibilities. >> >>> >> >>>> Is there any way to identify that and what kind of debug >> flags/settings >> >>>> might give us this information? >> >>>> >> >>>>> It also needs detail debug information to make the conclusion. >> >>>>> >> >>>> Do you think any of the information dumped in hs_err* file might >> give us >> >>>> more info (like registers content/Instructions/core file)? >> >>>> >> >>>> Can you please let me know what additional details might help to make >> >>> the >> >>>> conclusion? Also how to get those information? >> >>> Let's see the complete hs_err file. >> >>> >> >>> -- >> >>> Andrew Haley (he/him) >> >>> Java Platform Lead Engineer >> >>> Red Hat UK Ltd. >> >>> https://keybase.io/andrewhaley >> >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >>> >> >>> >> >> From m.sundar85 at gmail.com Tue Mar 3 19:02:50 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:02:50 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: Trying to send crash file1 alone. Thanks Sundar From kim.barrett at oracle.com Tue Mar 3 19:07:21 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 14:07:21 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> Message-ID: > On Mar 3, 2020, at 8:21 AM, Per Liden wrote: > > Hi Kim, > > On 2/28/20 10:48 PM, Kim Barrett wrote: >> Please review this change which removes the ConcurrentGCPhaseManager >> class and replaces it with ConcurrentGCBreakpoints. >> This is joint work with Per Liden. >> This change provides a client API, used by WhiteBox. The usage model >> for a client is >> (1) Acquire control of concurrent collection cycles. >> (2) Do work that must be performed while the collection cycle is in a >> known state. >> (3) Request the concurrent collector run to a named "breakpoint", or >> run to completion, and then hold there, waiting for further commands. >> (4) Optionally goto (2). >> (5) Release control of concurrent collection cycles. >> Tests have been updated to use the new WhiteBox API. >> This change provides implementations of the new mechanism for G1 and >> ZGC. A Shenandoah implementation is being left to others, but we >> don't see any obvious reason for it to be difficult. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240239 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ > > This looks good to me. However, it would be good if someone else had a closer look at the G1 changes, as I'm feeling less confident reviewing that part. Thanks. Yeah, the G1 changes are not as nice as one might wish. I filed a couple of bugs around G1?s initiation of concurrent marking while working on this change. See https://bugs.openjdk.java.net/browse/JDK-8236031 https://bugs.openjdk.java.net/browse/JDK-8235737 I don?t think either of those block this change, but fixing them might make some parts a little easier to understand. From m.sundar85 at gmail.com Tue Mar 3 19:12:03 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:12:03 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com>

Message-ID: Sorry for spamming, can someone confirm if you received 2 crash report files? I tried sending separately but only 1 file went through other still says message body more than 500K. Thanks Sundar On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M wrote: > From ioi.lam at oracle.com Tue Mar 3 19:27:21 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 3 Mar 2020 11:27:21 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com>

Message-ID: <8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> For me, at least, your attachment has been filtered out by the mail server. Since this is an AdoptJDK build, I would suggest filing a bug according to the hs_err file # If you would like to submit a bug report, please visit: #?? https://github.com/AdoptOpenJDK/openjdk-build/issues and then attach your hs_err log there, and then post the URL of the bug here. Thanks - Ioi On 3/3/20 11:12 AM, Sundara Mohan M wrote: > Sorry for spamming, can someone confirm if you received 2 crash report > files? I tried sending separately but only 1 file went through other still > says message body more than 500K. > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M wrote: > From m.sundar85 at gmail.com Tue Mar 3 19:30:57 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:30:57 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> References:

<442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com>

<8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> Message-ID: Hi Ioi, Thanks for the information. I have uploaded logs here https://github.com/AdoptOpenJDK/openjdk-support/issues/69 Thanks Sundar On Tue, Mar 3, 2020 at 2:27 PM Ioi Lam wrote: > For me, at least, your attachment has been filtered out by the mail server. > > Since this is an AdoptJDK build, I would suggest filing a bug according > to the hs_err file > > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > and then attach your hs_err log there, and then post the URL of the bug > here. > > Thanks > - Ioi > > On 3/3/20 11:12 AM, Sundara Mohan M wrote: > > Sorry for spamming, can someone confirm if you received 2 crash report > > files? I tried sending separately but only 1 file went through other > still > > says message body more than 500K. > > > > Thanks > > Sundar > > > > On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M > wrote: > > > > From ioi.lam at oracle.com Tue Mar 3 20:12:04 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 3 Mar 2020 12:12:04 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> The crash happened while the GC is running. I tried disasm of the crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from the 13.0.1+9 GA binaries of AdoptJDK) (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc ?? <>:??? push?? %rbp ?? <+1>:??? mov??? %rsp,%rbp ?? <+4>:??? push?? %r13 ?? <+6>:??? push?? %r12 ?? <+8>:??? push?? %rbx ?? <+9>:??? sub??? $0x8,%rsp ?? <+13>:??? mov??? (%rsi),%rbx? ;;; rbx = oop ?? <+16>:??? test?? %rbx,%rbx??? ;;; oop != null? ?? <+19>:??? je???? 0x7ffff67ca317 <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> ?? <+21>:??? lea??? 0x9c0afc(%rip),%rax??????? # 0x7ffff718add8 <_ZN20ParCompactionManager12_mark_bitmapE> ?? <+28>:??? mov??? %rbx,%rcx??? ;;; rcx = oop ?? <+31>:??? mov??? (%rax),%rdx? ;;; rdx = ParCompactionManager::_mark_bitmap ?? <+34>:??? sub??? (%rdx),%rcx? ;;; rcx = oop - _mark_bitmap->_region_start ?? <+37>:??? mov??? 0x10(%rdx),%rdx ;; rdx = _mark_bitmap->_beg_bits->_map ?? <+41>:??? mov??? %rcx,%rax?? ;;;? rax = oop - _mark_bitmap->_region_start ?? <+44>:??? lea??? 0x93b935(%rip),%rcx?????? # 0x7ffff7105c28 ?? <+51>:??? shr??? $0x3,%rax ?? <+55>:??? mov??? (%rcx),%ecx ?? <+57>:??? shr??? %cl,%rax ?? <+60>:??? mov??? %rax,%rcx ?? <+63>:??? mov??? %rax,%rsi????? ;;; rsi = index of oop inside mark_bitmap ?? <+66>:??? mov??? $0x1,%eax ?? <+71>:??? and??? $0x3f,%ecx ?? <+74>:??? shr??? $0x6,%rsi ?? <+78>:??? shl??? %cl,%rax ?? <+81>:??? test?? %rax,(%rdx,%rsi,8) << crash This looks like that the oop that we try to mark is actually outside of the heap range, so trying to mark it in the mark_bitmap causes this: ?? siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: ?? 0x0000000000000000 Here are the values of the registers for the "test" instruction above: ??? RAX=0x0000000000000001 is an unknown value ??? RDX=0x00007f55af000000 points into unknown readable memory: 01 00 00 00 01 00 00 04 ??? RSI=0x007fffc05491d000 is an unknown value As you can see, RSI is very large, which means you have an invalid oop in the stack that's probably very large. ??? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 ??? [libjvm.so+0xc58c8b]? OopMapSet::oops_do() ??? [libjvm.so+0x7521e9]? frame::oops_do_internal()+0x99 <<<< HERE ??? [libjvm.so+0xf55757]? JavaThread::oops_do()+0x187 As others have mentioned, this kind of error is usually caused by invalid use of Unsafe or JNI that leads to heap corruption. However, it's plausible that somehow the VM has messed up the frame and tries to mark an invalid oop. Thanks - Ioi On 3/3/20 8:02 AM, Sundara Mohan M wrote: > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see following > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? > b. Can i assume the java thread printed after native stack trace was the > culprit? > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? > > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar From m.sundar85 at gmail.com Wed Mar 4 01:01:28 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 20:01:28 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> References: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> Message-ID: Hi Ioi, Thanks for the analysis. On Tue, Mar 3, 2020 at 3:12 PM Ioi Lam wrote: > The crash happened while the GC is running. I tried disasm of the > crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from the > 13.0.1+9 GA binaries of AdoptJDK) > > (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc > <>: push %rbp > <+1>: mov %rsp,%rbp > <+4>: push %r13 > <+6>: push %r12 > <+8>: push %rbx > <+9>: sub $0x8,%rsp > <+13>: mov (%rsi),%rbx ;;; rbx = oop > <+16>: test %rbx,%rbx ;;; oop != null? > <+19>: je 0x7ffff67ca317 > <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> > <+21>: lea 0x9c0afc(%rip),%rax # 0x7ffff718add8 > <_ZN20ParCompactionManager12_mark_bitmapE> > <+28>: mov %rbx,%rcx ;;; rcx = oop > <+31>: mov (%rax),%rdx ;;; rdx = > ParCompactionManager::_mark_bitmap > <+34>: sub (%rdx),%rcx ;;; rcx = oop - > _mark_bitmap->_region_start > <+37>: mov 0x10(%rdx),%rdx ;; rdx = _mark_bitmap->_beg_bits->_map > <+41>: mov %rcx,%rax ;;; rax = oop - > _mark_bitmap->_region_start > <+44>: lea 0x93b935(%rip),%rcx # 0x7ffff7105c28 > > <+51>: shr $0x3,%rax > <+55>: mov (%rcx),%ecx > <+57>: shr %cl,%rax > <+60>: mov %rax,%rcx > <+63>: mov %rax,%rsi ;;; rsi = index of oop inside > mark_bitmap > <+66>: mov $0x1,%eax > <+71>: and $0x3f,%ecx > <+74>: shr $0x6,%rsi > <+78>: shl %cl,%rax > <+81>: test %rax,(%rdx,%rsi,8) << crash > > > This looks like that the oop that we try to mark is actually outside of > the heap range, so trying to mark it in the mark_bitmap causes this: > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > > > Here are the values of the registers for the "test" instruction above: > > > RAX=0x0000000000000001 is an unknown value > RDX=0x00007f55af000000 points into unknown readable memory: 01 00 > 00 00 01 00 00 04 > RSI=0x007fffc05491d000 is an unknown value > > > As you can see, RSI is very large, which means you have an invalid oop > in the stack that's probably very large. > Can you please explain "stack" means here? Is it functions stack variable or some thing which GC internally uses? > > > [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > [libjvm.so+0xc58c8b] OopMapSet::oops_do() > [libjvm.so+0x7521e9] frame::oops_do_internal()+0x99 <<<< HERE > [libjvm.so+0xf55757] JavaThread::oops_do()+0x187 > > > As others have mentioned, this kind of error is usually caused by > invalid use of Unsafe or JNI that leads to heap corruption. However, > it's plausible that somehow the VM has messed up the frame and tries to > mark an invalid oop. > Was trying to avoid using JNI calls to check if that is the cause but that seems not an option for now. Do you think any other way to get the root cause for this? > Thanks > - Ioi > > > On 3/3/20 8:02 AM, Sundara Mohan M wrote: > > Hi, > > I am seeing JVM crashes on our system in GC Thread with parallel gc > on > > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 > GA > > builds. > > Adding some logs lines to give some context. > > > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > parallel > > gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > > Enterprise Linux Server release 6.10 (Santiago) > > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h > 10m > > 26s) > > > > > > Following is the stack trace > > ex1: > > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 62465 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > > J 474206 c2 > > > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > > j > > > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > > j > > > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > > j > > > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > > j > > > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > > v ~StubRoutines::call_stub > > > > ex2: > > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > const*, > > OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 54174 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > > J 334031 c2 > > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > > J 53431 c2 > > > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > > J 63819 c2 > > > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > > J 334032 c2 > > > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > > J 403918 c2 > > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > > J 17530 c2 > com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > > J 31970% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > > J 7487 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > > [0x00007f667dd44a60+0x0000000000000df4] > > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > > [0x00007f667d1f63c0+0x000000000000007c] > > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > Not very frequent but ~90 days ~120 crashes with following signal > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > This signal is generated when we try to access non canonical address in > > linux. > > > > As suggested by Stefan in another thread i tried to > > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > > applications not surviving our production traffic(timing out and requests > > are failing). > > > > Questions > > 1. When i looked at source code for printing stack trace i see following > > > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > > (Prints native stack trace) > > > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > > (printing Java thread stack trace if it is involved in GC crash) > > a. How do you know this java thread was involved in jvm crash? > > b. Can i assume the java thread printed after native stack trace was > the > > culprit? > > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but > different > > stack trace in both crashes can this be the root cause? > > > > 2. Thinking of excluding compilation > > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > > production to see if compilation of this method is the cause. Does it > make > > sense? > > > > 3. Any other suggestion on debugging this further? > > > > TIA > > Sundar > > Thanks Sundar From kim.barrett at oracle.com Wed Mar 4 02:16:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:16:38 -0500 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement Message-ID: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Please review this change to the handling of "padding" for the threshold used to decide whether a mutator thread should perform concurrent refinement. Rather than doing a slightly tricky (because of potential overflow) computation every time a mutator thread completes a buffer, instead perform that computation once and record the result for repeated use. CR: https://bugs.openjdk.java.net/browse/JDK-8239825 Webrev: https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ Testing: mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. Local (linux-x64) hotspot:tier1 with just this change. From kim.barrett at oracle.com Wed Mar 4 02:17:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:17:46 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush Message-ID: Please review this trivial change to remove the useless call to flush() from the G1DirtyCardQueue destructor. See the CR for more details. This removes the need for a non-trivial destructor for that class. CR: https://bugs.openjdk.java.net/browse/JDK-8240133 Webrev: https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ Testing: mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. From kim.barrett at oracle.com Wed Mar 4 02:32:06 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:32:06 -0500 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer Message-ID: Please review this change to the handling of completed buffers by mutator threads. Previously it would conditionally process and potentially reuse the buffer, rather than enqueuing it. Now, always enqueue the buffer and allocate a new one, and conditionally process the next (oldest) dirty buffer in the DCQS. The benefit of this is that the buffers being processed by the mutator age for a while in the DCQS (just as is done by for concurrent refinement thread processing), so if the mutator is making repeated writes to the same or nearby locations, the associated card marking has more opportunaty to be filtered out. CR: https://bugs.openjdk.java.net/browse/JDK-8139652 Webrev: https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ Testing mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. From stefan.karlsson at oracle.com Wed Mar 4 08:37:03 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 4 Mar 2020 09:37:03 +0100 Subject: Need help on debugging JVM crash In-Reply-To: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> References: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> Message-ID: <6420763d-43e8-9ee8-0041-c06578911644@oracle.com> FWIW, I see that this is run with -XX:-OmitStackTraceInFastThrowFalse. Maybe there's a problem with that flag? Some more info from the hs_err file that could further clues to the problem: The Java thread the GC is scanning is creating a ThrowableProxy, and is in the process of taking a slow path to allocate an array. Looking at the code it seems like it first calls Thread.getStackTrace(), and then creates an array of proxies to those elements. One of the hs_err files report over > 800 OutOfMemoryErrors. StefanK On 2020-03-03 21:12, Ioi Lam wrote: > The crash happened while the GC is running. I tried disasm of the > crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from > the 13.0.1+9 GA binaries of AdoptJDK) > > (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc > ?? <>:??? push?? %rbp > ?? <+1>:??? mov??? %rsp,%rbp > ?? <+4>:??? push?? %r13 > ?? <+6>:??? push?? %r12 > ?? <+8>:??? push?? %rbx > ?? <+9>:??? sub??? $0x8,%rsp > ?? <+13>:??? mov??? (%rsi),%rbx? ;;; rbx = oop > ?? <+16>:??? test?? %rbx,%rbx??? ;;; oop != null? > ?? <+19>:??? je???? 0x7ffff67ca317 > <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> > ?? <+21>:??? lea??? 0x9c0afc(%rip),%rax??????? # 0x7ffff718add8 > <_ZN20ParCompactionManager12_mark_bitmapE> > ?? <+28>:??? mov??? %rbx,%rcx??? ;;; rcx = oop > ?? <+31>:??? mov??? (%rax),%rdx? ;;; rdx = > ParCompactionManager::_mark_bitmap > ?? <+34>:??? sub??? (%rdx),%rcx? ;;; rcx = oop - > _mark_bitmap->_region_start > ?? <+37>:??? mov??? 0x10(%rdx),%rdx ;; rdx = > _mark_bitmap->_beg_bits->_map > ?? <+41>:??? mov??? %rcx,%rax?? ;;;? rax = oop - > _mark_bitmap->_region_start > ?? <+44>:??? lea??? 0x93b935(%rip),%rcx?????? # 0x7ffff7105c28 > > ?? <+51>:??? shr??? $0x3,%rax > ?? <+55>:??? mov??? (%rcx),%ecx > ?? <+57>:??? shr??? %cl,%rax > ?? <+60>:??? mov??? %rax,%rcx > ?? <+63>:??? mov??? %rax,%rsi????? ;;; rsi = index of oop inside > mark_bitmap > ?? <+66>:??? mov??? $0x1,%eax > ?? <+71>:??? and??? $0x3f,%ecx > ?? <+74>:??? shr??? $0x6,%rsi > ?? <+78>:??? shl??? %cl,%rax > ?? <+81>:??? test?? %rax,(%rdx,%rsi,8) << crash > > > This looks like that the oop that we try to mark is actually outside > of the heap range, so trying to mark it in the mark_bitmap causes this: > > > ?? siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > ?? 0x0000000000000000 > > > Here are the values of the registers for the "test" instruction above: > > > ??? RAX=0x0000000000000001 is an unknown value > ??? RDX=0x00007f55af000000 points into unknown readable memory: 01 00 > 00 00 01 00 00 04 > ??? RSI=0x007fffc05491d000 is an unknown value > > > As you can see, RSI is very large, which means you have an invalid oop > in the stack that's probably very large. > > > ??? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > ??? [libjvm.so+0xc58c8b]? OopMapSet::oops_do() > ??? [libjvm.so+0x7521e9]? frame::oops_do_internal()+0x99 <<<< HERE > ??? [libjvm.so+0xf55757]? JavaThread::oops_do()+0x187 > > > As others have mentioned, this kind of error is usually caused by > invalid use of Unsafe or JNI that leads to heap corruption. However, > it's plausible that somehow the VM has messed up the frame and tries > to mark an invalid oop. > > Thanks > - Ioi > > > On 3/3/20 8:02 AM, Sundara Mohan M wrote: >> Hi, >> ???? I am seeing JVM crashes on our system in GC Thread with parallel >> gc on >> x86 linux. Observed the same crash happening on >> JVM-11.0.6/13.0.2/13.0.1 GA >> builds. >> Adding some logs lines to give some context. >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> #? SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 >> # >> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) >> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, >> parallel >> gc, linux-amd64) >> # Problematic frame: >> # V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 >> # >> # No core dump will be written. Core dumps have been disabled. To enable >> core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # If you would like to submit a bug report, please visit: >> #?? https://github.com/AdoptOpenJDK/openjdk-build/issues >> # >> >> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat >> Enterprise Linux Server release 6.10 (Santiago) >> Time: Thu Feb? 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d >> 7h 10m >> 26s) >> >> >> Following is the stack trace >> ex1: >> Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, >> ? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, >> j=interpreted, Vv=VM code, C=native code) >> *V? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* >> V? [libjvm.so+0xc58c8b]? OopMapSet::oops_do(frame const*, RegisterMap >> const*, OopClosure*)+0x2eb >> V? [libjvm.so+0x7521e9]? frame::oops_do_internal(OopClosure*, >> CodeBlobClosure*, RegisterMap*, bool)+0x99 >> V? [libjvm.so+0xf55757]? JavaThread::oops_do(OopClosure*, >> CodeBlobClosure*)+0x187 >> V? [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, >> unsigned int)+0xb0 >> V? [libjvm.so+0x7e0f8b]? GCTaskThread::run()+0x1eb >> V? [libjvm.so+0xf5d43d]? Thread::call_run()+0x10d >> V? [libjvm.so+0xc74337]? thread_native_entry(Thread*)+0xe7 >> >> JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> v? ~RuntimeStub::_new_array_Java >> J 62465 c2 >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] >> J 474206 c2 >> org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V >> >> (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] >> j >> org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 >> j >> org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 >> j >> org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 >> j >> org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 >> J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ >> 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] >> J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 >> bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] >> J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ >> 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] >> J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V >> (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] >> J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ >> 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] >> v? ~StubRoutines::call_stub >> >> ex2: >> Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, >> ? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, >> j=interpreted, Vv=VM code, C=native code) >> >> *V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V >> ? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*, RegisterMap >> const*, >> OopClosure*)+0x2eb >> V? [libjvm.so+0x765489]? frame::oops_do_internal(OopClosure*, >> CodeBlobClosure*, RegisterMap*, bool)+0x99 >> V? [libjvm.so+0xf68b17]? JavaThread::oops_do(OopClosure*, >> CodeBlobClosure*)+0x187 >> V? [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, >> unsigned int)+0xb0 >> V? [libjvm.so+0x7f422b]? GCTaskThread::run()+0x1eb >> V? [libjvm.so+0xf707fd]? Thread::call_run()+0x10d >> V? [libjvm.so+0xc875b7]? thread_native_entry(Thread*)+0xe7 >> >> JavaThread 0x00007f5518004000 (nid = 75659) was being processed >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> v? ~RuntimeStub::_new_array_Java >> J 54174 c2 >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >> J 334031 c2 >> com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 >> bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] >> J 53431 c2 >> com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; >> >> (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] >> J 63819 c2 >> com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; >> >> (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] >> J 334032 c2 >> com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; >> >> (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] >> J 403918 c2 >> com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 >> bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] >> J 17530 c2 >> com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z >> (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] >> J 31970% c2 >> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; >> >> (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] >> j >> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 >> J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 >> bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] >> J 7487 c1 >> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V >> >> java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 >> [0x00007f667dd44a60+0x0000000000000df4] >> J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V >> java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c >> [0x00007f667d1f63c0+0x000000000000007c] >> J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ >> 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] >> v? ~StubRoutines::call_stub >> >> Not very frequent but ~90 days ~120 crashes with following signal >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: >> 0x0000000000000000 >> This signal is generated when we try to access non canonical address in >> linux. >> >> As suggested by Stefan in another thread i tried to >> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency >> and >> applications not surviving our production traffic(timing out and >> requests >> are failing). >> >> Questions >> 1. When i looked at source code for printing stack trace i see following >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >> (printing Java thread stack trace if it is involved in GC crash) >> ?? a. How do you know this java thread was involved in jvm crash? >> ?? b. Can i assume the java thread printed after native stack trace >> was the >> culprit? >> ?? c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> different >> stack trace in both crashes can this be the root cause? >> >> 2. Thinking of excluding compilation >> of ch.qos.logback.classic.spi.ThrowableProxy class and running in >> production to see if compilation of this method is the cause. Does it >> make >> sense? >> >> 3. Any other suggestion on debugging this further? >> >> TIA >> Sundar > From shade at redhat.com Wed Mar 4 08:39:22 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 09:39:22 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240511 See bug for rationale. The best fix seems to be just using ParallelGCThreads and ditching Shenandoah-specific option altogether: https://cr.openjdk.java.net/~shade/8240511/webrev.01/ Testing: hotspot_gc_shenandoah, eyeballing gross pause times -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 10:41:17 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 11:41:17 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: References: Message-ID: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> Ok yes, that makes sense. That flag predates upstream integration of that code, and it wasn't quite clear how many threads are useful for safepoint cleanup. IIRC, I found that hammering it with ParallelGCThreads was overkill - on my machine. But you are right, hard-wiring it to 4 is certainly overkill on smaller machines than mine ;-) Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240511 > > See bug for rationale. The best fix seems to be just using ParallelGCThreads and ditching > Shenandoah-specific option altogether: > https://cr.openjdk.java.net/~shade/8240511/webrev.01/ > > Testing: hotspot_gc_shenandoah, eyeballing gross pause times > From shade at redhat.com Wed Mar 4 10:48:27 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 11:48:27 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> References: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> Message-ID: <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> On 3/4/20 11:41 AM, Roman Kennke wrote: > Ok yes, that makes sense. > > That flag predates upstream integration of that code, and it wasn't > quite clear how many threads are useful for safepoint cleanup. IIRC, I > found that hammering it with ParallelGCThreads was overkill - on my > machine. But you are right, hard-wiring it to 4 is certainly overkill on > smaller machines than mine ;-) I ran a few latency-sensitive tests on my smaller desktop, and they did not regress. I believe that is partly because we have trimmed down the number of parallel threads with JDK-8225229. Therefore I see no reason to keep it in. Another unnecessary GC option bites the dust. -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 10:51:54 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 11:51:54 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> References: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> Message-ID: >> Ok yes, that makes sense. >> >> That flag predates upstream integration of that code, and it wasn't >> quite clear how many threads are useful for safepoint cleanup. IIRC, I >> found that hammering it with ParallelGCThreads was overkill - on my >> machine. But you are right, hard-wiring it to 4 is certainly overkill on >> smaller machines than mine ;-) > I ran a few latency-sensitive tests on my smaller desktop, and they did not regress. I believe that > is partly because we have trimmed down the number of parallel threads with JDK-8225229. Therefore I > see no reason to keep it in. Another unnecessary GC option bites the dust. Sure, go! Roman From shade at redhat.com Wed Mar 4 17:14:57 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 18:14:57 +0100 Subject: RFR (XS) 8240534: Shenandoah: ditch debug safepoint timeout adjustment block Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240534 This seems to be causing some of the failures on our new test servers: diff -r 6f709455592a src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 11:50:28 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 18:12:25 2020 +0100 @@ -192,14 +192,4 @@ FLAG_SET_DEFAULT(TLABAllocationWeight, 90); } - - // Make sure safepoint deadlocks are failing predictably. This sets up VM to report - // fatal error after 10 seconds of wait for safepoint syncronization (not the VM - // operation itself). There is no good reason why Shenandoah would spend that - // much time synchronizing. -#ifdef ASSERT - FLAG_SET_DEFAULT(SafepointTimeout, true); - FLAG_SET_DEFAULT(SafepointTimeoutDelay, 10000); - FLAG_SET_DEFAULT(AbortVMOnSafepointTimeout, true); -#endif } -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 17:30:55 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 18:30:55 +0100 Subject: RFR (XS) 8240534: Shenandoah: ditch debug safepoint timeout adjustment block In-Reply-To: References: Message-ID: <1516299c-e0a6-6b3f-0928-49b08bac3a67@redhat.com> Ok. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240534 > > This seems to be causing some of the failures on our new test servers: > > diff -r 6f709455592a src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 11:50:28 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 18:12:25 2020 +0100 > @@ -192,14 +192,4 @@ > FLAG_SET_DEFAULT(TLABAllocationWeight, 90); > } > - > - // Make sure safepoint deadlocks are failing predictably. This sets up VM to report > - // fatal error after 10 seconds of wait for safepoint syncronization (not the VM > - // operation itself). There is no good reason why Shenandoah would spend that > - // much time synchronizing. > -#ifdef ASSERT > - FLAG_SET_DEFAULT(SafepointTimeout, true); > - FLAG_SET_DEFAULT(SafepointTimeoutDelay, 10000); > - FLAG_SET_DEFAULT(AbortVMOnSafepointTimeout, true); > -#endif > } > > From zgu at redhat.com Wed Mar 4 23:06:14 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 4 Mar 2020 18:06:14 -0500 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> Message-ID: Traversal GC has the same issue, also need to remark on stack code roots in final traversal. @@ -263,11 +263,12 @@ if (!_heap->is_degenerated_gc_in_progress()) { ShenandoahTraversalRootsClosure roots_cl(q, rp); ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); if (unload_classes) { ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); - _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, NULL, &tc); + MarkingCodeBlobClosure code_cl(&roots_cl, CodeBlobToOopClosure::FixRelocations); + _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, &code_cl, &tc); } else { CLDToOopClosure cld_cl(&roots_cl, ClassLoaderData::_claim_strong); _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); } } else { Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ Thank, -Zhengyu On 2/25/20 12:13 PM, Zhengyu Gu wrote: > Shenandoah encounters a few test failures with tools/javac. Verifier > catches unmarked oops in nmethod's metadata during root evacuation in > final mark phase. > > The problem is that, Shenandoah marks on stack nmethods in init mark > pause, but it does not mark nmethod's metadata during concurrent mark > phase, when new nmethod is about to be executed. > > The solution: > 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the > nmethod is about to be executed, when nmethod entry barrier is supported. > > 2) Remark on stack nmethod's metadata at final mark pause. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and > release) > > Thanks, > > -Zhengyu From manc at google.com Thu Mar 5 01:32:10 2020 From: manc at google.com (Man Cao) Date: Wed, 4 Mar 2020 17:32:10 -0800 Subject: G1: Abort concurrent at initial mark pause In-Reply-To: References: Message-ID: Hi Liang, Thanks for the quick contribution! This would solve a big problem for us. I have created https://bugs.openjdk.java.net/browse/JDK-8240556. You could start a thread with title "RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects". -Man From maoliang.ml at alibaba-inc.com Thu Mar 5 06:40:52 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 05 Mar 2020 14:40:52 +0800 Subject: =?UTF-8?B?UmU6IEcxOiBBYm9ydCBjb25jdXJyZW50IGF0IGluaXRpYWwgbWFyayBwYXVzZQ==?= In-Reply-To: References: , Message-ID: <9e9533b4-bb6a-4631-97e3-1e254092aa6e.maoliang.ml@alibaba-inc.com> Hi Man, Thanks for creating the bug id! Thanks, Liang ------------------------------------------------------------------ From:Man Cao Send Time:2020 Mar. 5 (Thu.) 09:32 To:hotspot-gc-dev Cc:"MAO, Liang" ; Thomas Schatzl Subject:Re: G1: Abort concurrent at initial mark pause Hi Liang, Thanks for the quick contribution! This would solve a big problem for us. I have created https://bugs.openjdk.java.net/browse/JDK-8240556. You could start a thread with title "RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects". -Man From maoliang.ml at alibaba-inc.com Thu Mar 5 07:13:38 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 05 Mar 2020 15:13:38 +0800 Subject: =?UTF-8?B?UkZSIChTKTogODI0MDU1NjogQWJvcnQgY29uY3VycmVudCBtYXJrIGFmdGVyIGVmZmVjdGl2?= =?UTF-8?B?ZSBlYWdlciByZWNsYW1hdGlvbiBvZiBodW1vbmdvdXMgb2JqZWN0cw==?= Message-ID: Hi All, Now we have the bug id. I did more test to the patch. There's a little concern in the patch that when we decide to cancle the concurrent cycle in initial mark pause we need to clear the next bitmap which supposes to be cleared concurrently. In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, the time spent on clearing next bitmap was consistently less than 10ms. So I guess it could be acceptable. Bug: https://bugs.openjdk.java.net/browse/JDK-8240556 Webrev: http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang ------------------------------------------------------------------ From:MAO, Liang Send Time:2020 Mar. 3 (Tue.) 19:14 To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev Subject:G1: Abort concurrent at initial mark pause Hi All, As previous discusion, there're several ideas to improve the humongous objects handling. We've made some experiments that canceling concurrent mark at initial mark pause is proved to be effective in the senario that frequent temporary humongous objects allocation leads to frequent concurrent mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is also the exact case but not GC sensative so there's little difference in score. The patch is small and shall we have a bug id for it? http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang From thomas.schatzl at oracle.com Thu Mar 5 09:50:03 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 10:50:03 +0100 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: Hi Kim, On 04.03.20 03:17, Kim Barrett wrote: > Please review this trivial change to remove the useless call to flush() from > the G1DirtyCardQueue destructor. See the CR for more details. This removes > the need for a non-trivial destructor for that class. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ > > Testing: > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > looks good to me. Thomas From ivan.walulya at oracle.com Thu Mar 5 10:32:17 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:32:17 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening Message-ID: Hi all, Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ //Ivan From ivan.walulya at oracle.com Thu Mar 5 10:33:47 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:33:47 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity Message-ID: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> Hi all, Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ //Ivan From ivan.walulya at oracle.com Thu Mar 5 10:37:28 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:37:28 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before Message-ID: Hi all, Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ //Ivan From thomas.schatzl at oracle.com Thu Mar 5 11:11:48 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 12:11:48 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: References: Message-ID: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Hi, On 05.03.20 11:32, Ivan Walulya wrote: > Hi all, > > Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ > > > //Ivan > looks good. Please backport to 14u too. Thanks, Thomas From ralf.schmelter at sap.com Thu Mar 5 13:29:33 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Thu, 5 Mar 2020 13:29:33 +0000 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC Message-ID: Hi, could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ Best regards, Ralf From thomas.schatzl at oracle.com Thu Mar 5 14:40:20 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 15:40:20 +0100 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References: Message-ID: Hi Ralf, On 05.03.20 14:29, Schmelter, Ralf wrote: > Hi, > > could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ > looks good to me. Let me run it through testing. Thanks, Thomas From thomas.schatzl at oracle.com Thu Mar 5 15:13:58 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 16:13:58 +0100 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: Hi, On 04.03.20 03:16, Kim Barrett wrote: > Please review this change to the handling of "padding" for the threshold > used to decide whether a mutator thread should perform concurrent > refinement. Rather than doing a slightly tricky (because of potential > overflow) computation every time a mutator thread completes a buffer, > instead perform that computation once and record the result for repeated > use. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239825 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ > > Testing: > mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with just this change. > I think this is good. Thomas From manc at google.com Thu Mar 5 19:24:13 2020 From: manc at google.com (Man Cao) Date: Thu, 5 Mar 2020 11:24:13 -0800 Subject: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects In-Reply-To: References: Message-ID: Hi Liang, Overall, I think the approach would work well after fixing a few issues below. In g1CollectedHeap.cpp: 3111 if (gc_cause() == GCCause::_g1_humongous_allocation && > collector_state()->in_initial_mark_gc()) { > 3112 // Check if we still need to do concurrent mark after > evacuation > 3113 // Abort concurrent mark in case we cleaned humongous > objects via eager reclaim > 3114 should_start_conc_mark = > policy()->need_to_start_conc_mark("end of GC"); Two issues: (1) I think need_to_start_conc_mark() does not have the most up-to-date information at this point. For example, the later expand_heap_after_young_collection() could update G1IHOPControl::_target_occupancy, which is used by need_to_start_conc_mark(). One possible solution could be to move the " if (should_start_conc_mark) { concurrent_mark()->post_initial_mark(); } " below to after expand_heap_after_young_collection(). I'd wait for the G1 team members to confirm that this approach is safe. (2) Does it need to call collector_state()->set_in_initial_mark_gc(false) if need_to_start_conc_mark() returns false? Specifically, the later G1Policy::record_collection_pause_end() would call collector_state()->set_mark_or_rebuild_in_progress(true), if collector_state()->in_initial_mark_gc() remains true. This is probably wrong if the initial mark has been aborted. 2059 void G1CollectedHeap::decrement_old_marking_cycles_started() { > 2060 assert(_old_marking_cycles_started > 0, "must be"); Could it assert "_old_marking_cycles_started == _old_marking_cycles_completed + 1" instead? 3125 } else if (collector_state()->in_initial_mark_gc()) { > 3126 // Don't do concurrent mark any more > 3127 concurrent_mark()->initial_mark_abort(); > 3128 log_info(gc)("Concurrent Aborted"); It's probably better to move the log_info inside the initial_mark_abort() method. Also, "Concurrent Start Cancelled" is probably a more precise and unambiguous message. It corresponds to the "Pause Young (Concurrent Start)" in G1CollectedHeap::young_gc_name(), and does not collide with "Concurrent Mark Abort" in G1ConcurrentMark::concurrent_cycle_end(). Perhaps initial_mark_abort() could be renamed to cancel_initial_mark() also? -Man On Wed, Mar 4, 2020 at 11:13 PM Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; > hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent > concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > From stefan.johansson at oracle.com Thu Mar 5 20:15:10 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 5 Mar 2020 21:15:10 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Message-ID: <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> > 5 mars 2020 kl. 12:11 skrev Thomas Schatzl : > > Hi, > > On 05.03.20 11:32, Ivan Walulya wrote: >> Hi all, >> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. > > Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >> //Ivan > > looks good. Please backport to 14u too. Looks good to me too, Stefan > > Thanks, > Thomas From kim.barrett at oracle.com Fri Mar 6 00:50:36 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:50:36 -0500 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References:

Message-ID: <54DB78FD-E761-41A1-86C2-15DE18CADABC@oracle.com> > On Mar 5, 2020, at 9:40 AM, Thomas Schatzl wrote: > > Hi Ralf, > > On 05.03.20 14:29, Schmelter, Ralf wrote: >> Hi, >> could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. >> bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 >> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ > > looks good to me. Let me run it through testing. > > Thanks, > Thomas Looks good to me too. From kim.barrett at oracle.com Fri Mar 6 00:51:14 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:51:14 -0500 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: > On Mar 5, 2020, at 10:13 AM, Thomas Schatzl wrote: > > Hi, > > On 04.03.20 03:16, Kim Barrett wrote: >> Please review this change to the handling of "padding" for the threshold >> used to decide whether a mutator thread should perform concurrent >> refinement. Rather than doing a slightly tricky (because of potential >> overflow) computation every time a mutator thread completes a buffer, >> instead perform that computation once and record the result for repeated >> use. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8239825 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ >> Testing: >> mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. >> Local (linux-x64) hotspot:tier1 with just this change. > > I think this is good. > > Thomas Thanks. From kim.barrett at oracle.com Fri Mar 6 00:51:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:51:46 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References:

Message-ID: <7FFE8194-B5E0-489C-9F39-279C8FC081D0@oracle.com> > On Mar 5, 2020, at 4:50 AM, Thomas Schatzl wrote: > > Hi Kim, > > On 04.03.20 03:17, Kim Barrett wrote: >> Please review this trivial change to remove the useless call to flush() from >> the G1DirtyCardQueue destructor. See the CR for more details. This removes >> the need for a non-trivial destructor for that class. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ >> Testing: >> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. >> Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > > looks good to me. > > Thomas Thanks. From kim.barrett at oracle.com Fri Mar 6 01:38:52 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 20:38:52 -0500 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References: Message-ID: > On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: > > Hi all, > > Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ I think I'd prefer the old ordering, but capture num_free_regions() into a variable before the abandon, and use that variable in the logging. But there's also the question of why the log message mentions the number of free regions at all, since the number of pre-existing free regions isn't important because of the abandonment. From sangheon.kim at oracle.com Fri Mar 6 07:40:09 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 5 Mar 2020 23:40:09 -0800 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: Hi Kim, On 2/28/20 1:48 PM, Kim Barrett wrote: > Please review this change which removes the ConcurrentGCPhaseManager > class and replaces it with ConcurrentGCBreakpoints. > > This is joint work with Per Liden. > > This change provides a client API, used by WhiteBox. The usage model > for a client is > > (1) Acquire control of concurrent collection cycles. > > (2) Do work that must be performed while the collection cycle is in a > known state. > > (3) Request the concurrent collector run to a named "breakpoint", or > run to completion, and then hold there, waiting for further commands. > > (4) Optionally goto (2). > > (5) Release control of concurrent collection cycles. > > Tests have been updated to use the new WhiteBox API. > > This change provides implementations of the new mechanism for G1 and > ZGC. A Shenandoah implementation is being left to others, but we > don't see any obvious reason for it to be difficult. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240239 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ Looks good in general. But I have several minor nits. ------------------ src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp ?215???????? // Pause Remark. - Pre-existing: this comment should be moved to before? line 221. 221???????? CMRemark cl(_cm); ?216???????? ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); ?217???????? log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. ------------------ src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp 118?? static void at(const char* breakpoint); - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. ------------------ test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java ?138???????????????? throw new RuntimeException("Expected support"); - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. ------------------ For the record. I asked Kim about better alternative for 'const char*' at ConcurrentGCBreakpoints::run_to(const char* breakpoint) and ConcurrentGCBreakpoints::at(const char* breakpoint) something like static member or enum type. The reason is that such string will locate several places and there is already static member in WhiteBox.java. However, the breakpoint may vary among collectors and it is open set. And currently there are only 2 breakpoints, so Kim(and maybe Per) decided just not think hard about it. I am fine with it too. Thanks, Sangheon > > To possibly simplify the review, the open patch is also provided as a > pair of patches, one for removing the old mechanism and a second to > add the new mechanism. > > https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ > Removes ConcurrentGCPhaseManager and its G1 implementation, except > that tests are not modifed. > > https://cr.openjdk.java.net/~kbarrett/8240239/control.03/ > Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and > updates tests to use it. > > Testing: > mach5 tier1-5, which includes all the updated and new tests. > From ivan.walulya at oracle.com Fri Mar 6 08:38:04 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:38:04 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References:

Message-ID: Thanks kim! > > But there's also the question of why the log message mentions the > number of free regions at all, since the number of pre-existing free > regions isn't important because of the abandonment. I will remove the number of free regions from the log entry and then set back the previous ordering. > On 6 Mar 2020, at 02:38, Kim Barrett wrote: > >> On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ > > I think I'd prefer the old ordering, but capture num_free_regions() > into a variable before the abandon, and use that variable in the logging. > > But there's also the question of why the log message mentions the > number of free regions at all, since the number of pre-existing free > regions isn't important because of the abandonment. > From ivan.walulya at oracle.com Fri Mar 6 08:38:29 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:38:29 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Message-ID: <80098CA4-0B79-4A15-A4F8-1F31B6DBF5D6@oracle.com> Thanks Thomas! > On 5 Mar 2020, at 12:11, Thomas Schatzl wrote: > > Hi, > > On 05.03.20 11:32, Ivan Walulya wrote: >> Hi all, >> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. > > Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >> //Ivan > > looks good. Please backport to 14u too. > > Thanks, > Thomas From ivan.walulya at oracle.com Fri Mar 6 08:39:08 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:39:08 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> Message-ID: <46606E7A-7A60-4C8F-A7F8-0FE1083E40BF@oracle.com> Thanks Stefan! > On 5 Mar 2020, at 21:15, Stefan Johansson wrote: > > > >> 5 mars 2020 kl. 12:11 skrev Thomas Schatzl : >> >> Hi, >> >> On 05.03.20 11:32, Ivan Walulya wrote: >>> Hi all, >>> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. >> >> Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. >> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >>> //Ivan >> >> looks good. Please backport to 14u too. > Looks good to me too, > Stefan > >> >> Thanks, >> Thomas From stefan.johansson at oracle.com Fri Mar 6 08:51:06 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 09:51:06 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References:

Message-ID: Hi, On 2020-03-06 09:38, Ivan Walulya wrote: > Thanks kim! >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. > > I will remove the number of free regions from the log entry and then set back the previous ordering. Sounds good to me as well, reviewed, Stefan > >> On 6 Mar 2020, at 02:38, Kim Barrett wrote: >> >>> On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: >>> >>> Hi all, >>> >>> Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ >> >> I think I'd prefer the old ordering, but capture num_free_regions() >> into a variable before the abandon, and use that variable in the logging. >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. >> > From stefan.johansson at oracle.com Fri Mar 6 09:10:18 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 10:10:18 +0100 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: Hi Kim, On 2020-03-04 03:17, Kim Barrett wrote: > Please review this trivial change to remove the useless call to flush() from > the G1DirtyCardQueue destructor. See the CR for more details. This removes > the need for a non-trivial destructor for that class. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ Would it make sense to add an assert into the destructor to ensure no entries were added? Or is that problematic for some reason. If you prefer not to, I'm good with this change and you can consider it reviewed. Cheers, Stefan > > Testing: > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > From stefan.johansson at oracle.com Fri Mar 6 10:59:16 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 11:59:16 +0100 Subject: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects In-Reply-To: References: Message-ID: Hi Liang, Thanks for picking this up, really nice to see it progressing. It would be nice if we could make the clearing concurrently to avoid prolonging the pause. An alternative to abort like you do now, would be to let the concurrent cycle start, but have it abort it self directly. This should be done by calling: G1ConcurrentMark::concurrent_cycle_abort() This would also reuse the abort mechanism already in place and if aborting needs updating in the future there is only one place to change. There might be some things that have to be altered to get this to work and I haven't explored this more than in theory. Would you consider trying this out? I'm thinking this should look something like this in the log: GC(1) Pause Young (Concurrent Start) (G1 Evacuation Pause) 261M->262M(502M) 50.153ms GC(2) Concurrent Cycle GC(2) Concurrent Mark Abort GC(2) Concurrent Cycle 12.345ms We might want to call it something other than "Abort" in the logs to differ it from an abort by a Full GC, but we can discuss the details later on. Thanks, Stefan On 2020-03-05 08:13, Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > From maoliang.ml at alibaba-inc.com Fri Mar 6 11:35:17 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Fri, 06 Mar 2020 19:35:17 +0800 Subject: =?UTF-8?B?UkZSIChTKTogODI0MDU1NjogQWJvcnQgY29uY3VycmVudCBtYXJrIGFmdGVyIGVmZmVjdGl2?= =?UTF-8?B?ZSBlYWdlciByZWNsYW1hdGlvbiBvZiBodW1vbmdvdXMgb2JqZWN0cw==?= Message-ID: Hi, Thanks for Man's accurate comments and I made the change http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev.1/ Stefan's concern is fairly reasonable since I have noticed if GC workers are not enough, the addition pause time caused by clearing could be considerable. concurrent_cycle_abort might not be easily to reuse because it still clears the bitmap in pause. I was thinking to let the concurrent mark thread continue and finish the last step of "_cm->cleanup_for_next_mark()" although it has chance to delay the next initial mark. Anyway I'm glad to make a try and you guys can compare two approaches and provide comments. Thanks, Liang ------------------------------------------------------------------ From:Stefan Johansson Send Time:2020 Mar. 6 (Fri.) 18:59 To:"MAO, Liang" ; Thomas Schatzl ; Man Cao ; hotspot-gc-dev Subject:Re: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects Hi Liang, Thanks for picking this up, really nice to see it progressing. It would be nice if we could make the clearing concurrently to avoid prolonging the pause. An alternative to abort like you do now, would be to let the concurrent cycle start, but have it abort it self directly. This should be done by calling: G1ConcurrentMark::concurrent_cycle_abort() This would also reuse the abort mechanism already in place and if aborting needs updating in the future there is only one place to change. There might be some things that have to be altered to get this to work and I haven't explored this more than in theory. Would you consider trying this out? I'm thinking this should look something like this in the log: GC(1) Pause Young (Concurrent Start) (G1 Evacuation Pause) 261M->262M(502M) 50.153ms GC(2) Concurrent Cycle GC(2) Concurrent Mark Abort GC(2) Concurrent Cycle 12.345ms We might want to call it something other than "Abort" in the logs to differ it from an abort by a Full GC, but we can discuss the details later on. Thanks, Stefan On 2020-03-05 08:13, Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > From shade at redhat.com Fri Mar 6 12:01:59 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 6 Mar 2020 13:01:59 +0100 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240671 Webrev: https://cr.openjdk.java.net/~shade/8240671/webrev.01/ Tour of changes: *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees that we list the roots in the same order! Also makes the macro itself much shorter. *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use of ShenandoahWorkerTimingsTracker! *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and ShenandoahPhaseTimings interface. *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to shenandoahPhaseTimings.*, as it does not carry its own weight at this point. shenandoahPhaseTimings.* would be renamed at some point in the future. Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From thomas.schatzl at oracle.com Fri Mar 6 12:18:29 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 6 Mar 2020 13:18:29 +0100 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References:

Message-ID: <3eca0545-f206-110a-0aa8-7f95669cd49a@oracle.com> Hi, On 05.03.20 15:40, Thomas Schatzl wrote: > Hi Ralf, > > On 05.03.20 14:29, Schmelter, Ralf wrote: >> Hi, >> >> could you review the small change. It implements >> get_safepoint_workers() for the ParallelScavengeHeap, so that the >> worker threads could be used for other tasks. This is already >> implemented for G1, Z and Shenandoah. Since the parallel GC does used >> the worker threads only in the collection VM operation it can safely >> share them. >> >> bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 >> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ >> > > ? looks good to me. Let me run it through testing. > hs-tier1-5 look good. Ship it. Thanks, Thomas From ivan.walulya at oracle.com Fri Mar 6 12:35:16 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 13:35:16 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Message-ID: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Hi all, Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ Testing: Tier 1 - 3 //Ivan From rkennke at redhat.com Fri Mar 6 12:48:02 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 6 Mar 2020 13:48:02 +0100 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings In-Reply-To: References: Message-ID: Very good! That should make that code less error-prone and more consistent. Change looks good! Thank you, Roman On 3/6/20 1:01 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240671 > > Webrev: > https://cr.openjdk.java.net/~shade/8240671/webrev.01/ > > Tour of changes: > > *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees > that we list the roots in the same order! Also makes the macro itself much shorter. > > *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into > ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use > of ShenandoahWorkerTimingsTracker! > > *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and > ShenandoahPhaseTimings interface. > > *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to > shenandoahPhaseTimings.*, as it does not carry its own weight at this point. > shenandoahPhaseTimings.* would be renamed at some point in the future. > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From zgu at redhat.com Fri Mar 6 15:17:50 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Mar 2020 10:17:50 -0500 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings In-Reply-To: References: Message-ID: <99567ddf-ff3d-d7fb-89a9-ff81ccefebf9@redhat.com> Nice cleanup. Looks good to me. Thanks, -Zhengyu On 3/6/20 7:01 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240671 > > Webrev: > https://cr.openjdk.java.net/~shade/8240671/webrev.01/ > > Tour of changes: > > *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees > that we list the roots in the same order! Also makes the macro itself much shorter. > > *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into > ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use > of ShenandoahWorkerTimingsTracker! > > *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and > ShenandoahPhaseTimings interface. > > *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to > shenandoahPhaseTimings.*, as it does not carry its own weight at this point. > shenandoahPhaseTimings.* would be renamed at some point in the future. > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From kim.barrett at oracle.com Fri Mar 6 17:50:06 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 12:50:06 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References:

Message-ID: > On Mar 6, 2020, at 4:10 AM, Stefan Johansson wrote: > > Hi Kim, > > On 2020-03-04 03:17, Kim Barrett wrote: >> Please review this trivial change to remove the useless call to flush() from >> the G1DirtyCardQueue destructor. See the CR for more details. This removes >> the need for a non-trivial destructor for that class. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ > Would it make sense to add an assert into the destructor to ensure no entries were added? Or is that problematic for some reason. ~PtrQueue() already asserts _buf == NULL. > If you prefer not to, I'm good with this change and you can consider it reviewed. Thanks. From kim.barrett at oracle.com Fri Mar 6 17:52:34 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 12:52:34 -0500 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References:

Message-ID: <40722D8C-4F2F-4E9F-9A27-889A19CF0C79@oracle.com> > On Mar 6, 2020, at 3:38 AM, Ivan Walulya wrote: > > Thanks kim! >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. > > I will remove the number of free regions from the log entry and then set back the previous ordering. Sounds good. From kim.barrett at oracle.com Fri Mar 6 22:30:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 17:30:38 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> > On Mar 6, 2020, at 2:40 AM, sangheon.kim at oracle.com wrote: > > Hi Kim, > > On 2/28/20 1:48 PM, Kim Barrett wrote: >> Please review this change which removes the ConcurrentGCPhaseManager >> class and replaces it with ConcurrentGCBreakpoints. >> >> This is joint work with Per Liden. >> >> This change provides a client API, used by WhiteBox. The usage model >> for a client is >> >> (1) Acquire control of concurrent collection cycles. >> >> (2) Do work that must be performed while the collection cycle is in a >> known state. >> >> (3) Request the concurrent collector run to a named "breakpoint", or >> run to completion, and then hold there, waiting for further commands. >> >> (4) Optionally goto (2). >> >> (5) Release control of concurrent collection cycles. >> >> Tests have been updated to use the new WhiteBox API. >> >> This change provides implementations of the new mechanism for G1 and >> ZGC. A Shenandoah implementation is being left to others, but we >> don't see any obvious reason for it to be difficult. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240239 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ > Looks good in general. Thanks. > But I have several minor nits. > > ------------------ > src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp > 215 // Pause Remark. > - Pre-existing: this comment should be moved to before line 221. I don't think the comment should be moved. The intervening stuff is all related to the remark pause, and in particular that it demarcates the completion of concurrent marking. > 221 CMRemark cl(_cm); > > 216 ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); > 217 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", > - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. I don't think it matters much. Note that we're also including the time spent waiting on MMU. The delay caused by being stopped at a breakpoint obviously affects timing, one way or another. > src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp > 118 static void at(const char* breakpoint); > - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. I think simply "at" has the desired meaning; paraphrasing from a dictionary "expressing arrival in a particular place or position". "reached_at" would be redundant. The meaning I think you are referring to is from the idiom container.at(element_designator), where "at" is short for something like "reference/value at designated location". > test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java > 138 throw new RuntimeException("Expected support"); > - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. Sorry, but I'm not understanding the issue? In this case we expected the current collector to support concurrent GC breakpoints, but it doesn't, so we report a test failure that we expected support. In the other case we expected the current collector to not support breakpoints, but found that it claims that it does, so we report a test failure that we have unexpected support. Both of these indicate a mismatch between the expectations of the test and the capabilities of the collector. From sangheon.kim at oracle.com Fri Mar 6 23:05:18 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 6 Mar 2020 15:05:18 -0800 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> Message-ID: <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> On 3/6/20 2:30 PM, Kim Barrett wrote: >> On Mar 6, 2020, at 2:40 AM, sangheon.kim at oracle.com wrote: >> >> Hi Kim, >> >> On 2/28/20 1:48 PM, Kim Barrett wrote: >>> Please review this change which removes the ConcurrentGCPhaseManager >>> class and replaces it with ConcurrentGCBreakpoints. >>> >>> This is joint work with Per Liden. >>> >>> This change provides a client API, used by WhiteBox. The usage model >>> for a client is >>> >>> (1) Acquire control of concurrent collection cycles. >>> >>> (2) Do work that must be performed while the collection cycle is in a >>> known state. >>> >>> (3) Request the concurrent collector run to a named "breakpoint", or >>> run to completion, and then hold there, waiting for further commands. >>> >>> (4) Optionally goto (2). >>> >>> (5) Release control of concurrent collection cycles. >>> >>> Tests have been updated to use the new WhiteBox API. >>> >>> This change provides implementations of the new mechanism for G1 and >>> ZGC. A Shenandoah implementation is being left to others, but we >>> don't see any obvious reason for it to be difficult. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8240239 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ >> Looks good in general. > Thanks. > >> But I have several minor nits. >> >> ------------------ >> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp >> 215 // Pause Remark. >> - Pre-existing: this comment should be moved to before line 221. > I don't think the comment should be moved. The intervening stuff is > all related to the remark pause, and in particular that it demarcates > the completion of concurrent marking. OK > > >> 221 CMRemark cl(_cm); >> >> 216 ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); >> 217 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", >> - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. > I don't think it matters much. Note that we're also including the > time spent waiting on MMU. The delay caused by being stopped at a > breakpoint obviously affects timing, one way or another. OK > >> src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp >> 118 static void at(const char* breakpoint); >> - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. > I think simply "at" has the desired meaning; paraphrasing from a > dictionary "expressing arrival in a particular place or position". > "reached_at" would be redundant. The meaning I think you are > referring to is from the idiom container.at(element_designator), > where "at" is short for something like "reference/value at designated > location". OK > >> test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java >> 138 throw new RuntimeException("Expected support"); >> - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. > Sorry, but I'm not understanding the issue? In this case we expected > the current collector to support concurrent GC breakpoints, but it > doesn't, so we report a test failure that we expected support. In the > other case we expected the current collector to not support > breakpoints, but found that it claims that it does, so we report a > test failure that we have unexpected support. Both of these indicate a > mismatch between the expectations of the test and the capabilities of > the collector. > > I think I understand the exception, but I was feeling 'Unexpected un-support blah blah' or more detail saying 'current GC supports BP but WhiteBox check returned un-support blah blah' kind of message seem better explanation. But at least double negative (first example) would make more confused. I had a chat with Kim and now I agree with all of his comments. Sorry for noisy comments. Looks good to me as is. Thanks, Sangheon From kim.barrett at oracle.com Fri Mar 6 23:10:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 18:10:46 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> Message-ID: <3C6C537F-76D8-4900-ADA1-B58DCACFECCB@oracle.com> > On Mar 6, 2020, at 6:05 PM, sangheon.kim at oracle.com wrote: > Looks good to me as is. > > Thanks, > Sangheon Thanks. From kim.barrett at oracle.com Mon Mar 9 07:38:00 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 03:38:00 -0400 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush Message-ID: Please review this backout of JDK-8240133, which turns out to have problems and needs a bit of a rethink; see JDK-8240722 for details. CR: [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 [original] https://bugs.openjdk.java.net/browse/JDK-8240133 Webrev: https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ Testing: Local build. From stefan.johansson at oracle.com Mon Mar 9 07:59:17 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 9 Mar 2020 08:59:17 +0100 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: <38795206-9801-29ab-f328-686b7086b800@oracle.com> Looks good, StefanJ On 2020-03-09 08:38, Kim Barrett wrote: > Please review this backout of JDK-8240133, which turns out to have > problems and needs a bit of a rethink; see JDK-8240722 for details. > > CR: > [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 > [original] https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ > > Testing: > Local build. > > From kim.barrett at oracle.com Mon Mar 9 08:04:21 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 04:04:21 -0400 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush In-Reply-To: <38795206-9801-29ab-f328-686b7086b800@oracle.com> References: <38795206-9801-29ab-f328-686b7086b800@oracle.com> Message-ID: > On Mar 9, 2020, at 3:59 AM, Stefan Johansson wrote: > > Looks good, > StefanJ Thanks. > On 2020-03-09 08:38, Kim Barrett wrote: >> Please review this backout of JDK-8240133, which turns out to have >> problems and needs a bit of a rethink; see JDK-8240722 for details. >> CR: >> [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 >> [original] https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ >> Testing: >> Local build. From magnus.ihse.bursie at oracle.com Mon Mar 9 08:33:20 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 9 Mar 2020 09:33:20 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc Message-ID: <698c6117-f8c0-191e-9efb-41b5fd447961@oracle.com> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 /Magnus From david.holmes at oracle.com Mon Mar 9 09:10:57 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Mar 2020 19:10:57 +1000 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> Message-ID: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Hi Magnus, On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: > When reworking the JVM feature handling, I wanted to try to compile > Hotspot with various features enabled/disabled. I quickly found out that > it's not really possible to build hotspot without the serial gc. While > this is not a terribly important use case, I think it's good to be able > to select serial freely, just as with the other collectors. Really not sure this is a worthwhile exercise. > With this patch it is possible to build a truly minimal JVM using > 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 make/ModuleTools.gmk ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) $(JAVA_FLAGS_SMALL_BUILDJDK) \ that should be BUILDJDK_JAVA_FLAGS_SMALL. make/RunTestsPrebuiltSpec.gmk make/autoconf/boot-jdk.m4 ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 Depending on the default GC those -Xms and -Xmx settings may not be valid/possible. Other changes seem okay but I'll leave it for GC folk to comment on that. Cheers, David > > /Magnus From shade at redhat.com Mon Mar 9 09:20:43 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 10:20:43 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: On 3/9/20 10:10 AM, David Holmes wrote: > On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >> When reworking the JVM feature handling, I wanted to try to compile >> Hotspot with various features enabled/disabled. I quickly found out that >> it's not really possible to build hotspot without the serial gc. While >> this is not a terribly important use case, I think it's good to be able >> to select serial freely, just as with the other collectors. > > Really not sure this is a worthwhile exercise. Me neither. I think Serial GC always-present is a good compromise for the rest of the code: it is the very basic GC you can always count on. Nits: *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: - Epsilon should not ever be selected by ergonomics - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp - I don't think we indent nested #include, #define lines -- Thanks, -Aleksey From thomas.schatzl at oracle.com Mon Mar 9 11:00:24 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 9 Mar 2020 12:00:24 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Message-ID: <9d4f6c6d-7255-a0b6-3d34-a53fec932ab6@oracle.com> Hi, On 06.03.20 13:35, Ivan Walulya wrote: > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan > looks good. Please also remove the paragraph in the comment in heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Thanks, Thomas From ivan.walulya at oracle.com Mon Mar 9 11:38:52 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Mon, 9 Mar 2020 04:38:52 -0700 (PDT) Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Message-ID: <28446e64-870f-449d-a7f2-f9f6bce6956c@default> Thanks Thomas! > Please also remove the paragraph in the comment in > heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Noted. //Ivan ----- Original Message ----- From: thomas.schatzl at oracle.com To: hotspot-gc-dev at openjdk.java.net Sent: Monday, 9 March, 2020 12:00:44 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna Subject: Re: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Hi, On 06.03.20 13:35, Ivan Walulya wrote: > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan > looks good. Please also remove the paragraph in the comment in heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Thanks, Thomas From shade at redhat.com Mon Mar 9 13:18:58 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 14:18:58 +0100 Subject: RFR (S) 8240749: Shenandoah: refactor ShenandoahUtils Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240749 Webrev: https://cr.openjdk.java.net/~shade/8240749/webrev.01/ It mostly hides naked phase_timings()->record... calls with ShenandoahGCWorkerPhase wrapper. But also cleans up the code a bit. Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From shade at redhat.com Mon Mar 9 13:20:11 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 14:20:11 +0100 Subject: RFR (S) 8240750: Shenandoah: remove leftover files and mentions of ShenandoahAllocTracker Message-ID: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8240750 While working on JDK-8240215, I totally forgot to remove these leftovers. Webrev: https://cr.openjdk.java.net/~shade/8240750/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 9 14:03:08 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 9 Mar 2020 15:03:08 +0100 Subject: RFR (S) 8240749: Shenandoah: refactor ShenandoahUtils In-Reply-To: References: Message-ID: <299fe138-7c85-2443-b485-5d19a4d3b375@redhat.com> Looks good to me. Thanks! Roman On 3/9/20 2:18 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240749 > > Webrev: > https://cr.openjdk.java.net/~shade/8240749/webrev.01/ > > It mostly hides naked phase_timings()->record... calls with ShenandoahGCWorkerPhase wrapper. But > also cleans up the code a bit. > > Testing: hotspot_gc_shenandoah > From rkennke at redhat.com Mon Mar 9 14:03:38 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 9 Mar 2020 15:03:38 +0100 Subject: RFR (S) 8240750: Shenandoah: remove leftover files and mentions of ShenandoahAllocTracker In-Reply-To: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> References: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> Message-ID: Yep, looks good! Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240750 > > While working on JDK-8240215, I totally forgot to remove these leftovers. > > Webrev: > https://cr.openjdk.java.net/~shade/8240750/webrev.01/ > > Testing: hotspot_gc_shenandoah > From per.liden at oracle.com Mon Mar 9 15:03:33 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 9 Mar 2020 16:03:33 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <73577b98-c59e-9e80-b966-a11d501952d8@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>

<73577b98-c59e-9e80-b966-a11d501952d8@oracle.com> Message-ID: <52166bda-a3c7-5c61-92ca-4ba8e05fa4ed@oracle.com> Hi, On 2/27/20 10:13 AM, Stefan Johansson wrote: > Hi Erik, > > On 2020-02-26 18:28, Erik Gahlin wrote: >> Hi Stefan, >> >> GC-id would be nice, but perhaps not possible in all scenarios, i.e. >> -XX:+ExplicitGCInvokesConcurrent and Epsilon GC? > For ExplicitGCInvokesConcurrent it would not be a big problem, that > would start a concurrent cycle and we could use the id for that GC. I > also realized that we can get the GC-id without any problem. For other > events sent before the GC-id is properly setup, we use GCId::peek() > which returns the id that will be used for the next collection. I have to say that I don't think the GC-id is all the important/interesting here. Especially, since that ID can be a bit sketchy depending on the GC and/or configuration. cheers, Per > > For Epsilon, I'm not sure an event should be sent at all since they are > blocked, see: EpsilonHeap::collect(...) > > Thanks, > Stefan > >> >> Thanks >> Erik >> >> On 2020-02-26 14:21, Stefan Johansson wrote: >>> Hi Erik, >>> >>>> 26 feb. 2020 kl. 13:56 skrev Per Liden : >>>> >>>> Hi Erik, >>>> >>>> On 2020-02-26 13:50, Erik Gahlin wrote: >>>>> Hi, >>>>> Could I have a review of a JFR event that is emitted when >>>>> System.gc() is called. >>>>> Purpose is to collect the stack trace. It is not sufficient with >>>>> the cause field that the GarbageCollection event has today. >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~egahlin/8003216/ >>>> 489???? EventSystemGC event; >>>> 490???? event.commit(); >>>> 491???? Universe::heap()->collect(GCCause::_java_lang_system_gc); >>>> >>>> Don't you want the commit() call after the call to collect(), to get >>>> the timing right? >>> I was thinking the same thing, could also be nice to have the GC-id >>> associated with the event to make it easy to match it to GC-logs and >>> other GC-events. Not sure how to easily get the GC-id though, since >>> it?s not set at the time we commit the event. >>> >>> I guess if the event has the correct span with timestamps it will be >>> easy to figure out which other events are associated with it, even >>> without the GC-id. >>> >>> Cheers, >>> Stefan >>> >>>> cheers, >>>> Per >>>> >>>>> Testing: >>>>> tier1+tier2+jdk/jdk/jfr >>>>> Thanks >>>>> Erik From erik.joelsson at oracle.com Mon Mar 9 15:28:00 2020 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 9 Mar 2020 08:28:00 -0700 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> On 2020-03-09 02:20, Aleksey Shipilev wrote: > On 3/9/20 10:10 AM, David Holmes wrote: >> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>> When reworking the JVM feature handling, I wanted to try to compile >>> Hotspot with various features enabled/disabled. I quickly found out that >>> it's not really possible to build hotspot without the serial gc. While >>> this is not a terribly important use case, I think it's good to be able >>> to select serial freely, just as with the other collectors. >> Really not sure this is a worthwhile exercise. > Me neither. I think Serial GC always-present is a good compromise for the rest of the code: it is > the very basic GC you can always count on. I'm not a GC developer, but from a build point of view, it makes sense to allow for as free modularity of JVM features as possible. Certainly not all combinations are a good idea, and we are most definitely not going to test all combinations, but I also don't think the build should actively prevent anyone from experimentally exclude certain "features". I would imagine this kind of freedom being useful in certain development scenarios. > Nits: > > *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: > - Epsilon should not ever be selected by ergonomics > - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] This fallback list is clearly just meant to allow for any combination of GCs being compiled into the JVM. If the only one you picked was epsilon, then what other default would you expect? It's last in the list so any other GC will still be prioritized before it if present. For the same reason, the order of ZGC and Shenandoah is irrelevant and could just as well be the other way. It will never have any real consequence. This code is only there to keep things from falling apart when a non standard combination of jvm features is picked. /Erik > *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp > - I don't think we indent nested #include, #define lines > From kim.barrett at oracle.com Mon Mar 9 20:37:27 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 16:37:27 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> Message-ID: <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> > On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie wrote: > > When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. > > With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 > WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 > > /Magnus I'm inclined to agree with David and Aleksey that this isn't really a worthwhile exercise. Especially not if it involves making some otherwise questionable or controversial changes. In addition to the issues mentioned by David and Aleksey: ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/gcConfig.cpp I would instead suggest there should not be a default at all instead of adding these cases, and the user must explicitly select the GC to be used. Since we're talking about an atypical custom build anyway, the user presumably knows what they are doing. And yeah, that makes the buildjdk stuff elsewhere in this patch harder. Really, I think this ought to just be left alone, along with most of the other build-specific changes. [This also responds to / agrees with Aleksey's comment about this part.] ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/genCollectedHeap.cpp 197 #if INCLUDE_SERIALGC 198 MarkSweep::initialize(); 199 #endif This whole file, and several associated files, are *only* used by SerialGC now that CMS has been removed: JDK-8234502. ------------------------------------------------------------------------------ make/hotspot/lib/JvmFeatures.gmk 58 ifeq ($(JVM_VARIANT), custom) 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" 60 endif This change looks unrelated to whether serialgc is present or absent. If so, it doesn't belong in this changeset at all. ------------------------------------------------------------------------------ make/hotspot/lib/JvmFeatures.gmk [removed] 154 # If serial is disabled, we cannot use serial as OldGC in parallel 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp This was missed by JDK-8235860, which removed those files. Good find. ------------------------------------------------------------------------------ test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp As originally written, this test was *only* testing SerialGC. It's not obvious that it is actually GC-agnostic and can use the default GC if that isn't SerialGC. Certainly some of the naming suggests otherwise. Was this tested with all the other configurations? ------------------------------------------------------------------------------ From kim.barrett at oracle.com Tue Mar 10 01:16:40 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 21:16:40 -0400 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Message-ID: <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> > On Mar 6, 2020, at 7:35 AM, Ivan Walulya wrote: > > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan Looks good. From ivan.walulya at oracle.com Tue Mar 10 08:00:56 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 10 Mar 2020 09:00:56 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> Message-ID: <7096FE73-1D6A-48F5-B8C7-F83229C01430@oracle.com> Thanks Kim > On 10 Mar 2020, at 02:16, Kim Barrett wrote: > >> On Mar 6, 2020, at 7:35 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 >> webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ >> >> Testing: Tier 1 - 3 >> >> //Ivan > > Looks good. > From thomas.schatzl at oracle.com Tue Mar 10 09:16:06 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 10 Mar 2020 02:16:06 -0700 (PDT) Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> Message-ID: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Hi, On 05.03.20 11:33, Ivan Walulya wrote: > Hi all, > > Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ > > > //Ivan > some minor (imo) comments to start a discussion: - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. So overall, I am good with the change but asking for opinions :) Thanks, Thomas From ivan.walulya at oracle.com Tue Mar 10 09:26:34 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 10 Mar 2020 10:26:34 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Message-ID: > > some minor (imo) comments to start a discussion: > > - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. > > - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. Unconditionally clearly the data is better, and yes clearing the ratio data is to avoid expansion based on old data immediately after shrinking. The early exit does not update the windowing data. > > - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. > Noted, this will be changed to trace level. > So overall, I am good with the change but asking for opinions :) > > Thanks, > Thomas Thanks Thomas, //Ivan From magnus.ihse.bursie at oracle.com Tue Mar 10 14:51:39 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:51:39 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> Message-ID: <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> On 2020-03-09 16:28, Erik Joelsson wrote: > > On 2020-03-09 02:20, Aleksey Shipilev wrote: >> On 3/9/20 10:10 AM, David Holmes wrote: >>> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>>> When reworking the JVM feature handling, I wanted to try to compile >>>> Hotspot with various features enabled/disabled. I quickly found out >>>> that >>>> it's not really possible to build hotspot without the serial gc. While >>>> this is not a terribly important use case, I think it's good to be >>>> able >>>> to select serial freely, just as with the other collectors. >>> Really not sure this is a worthwhile exercise. >> Me neither. I think Serial GC always-present is a good compromise for >> the rest of the code: it is >> the very basic GC you can always count on. > I'm not a GC developer, but from a build point of view, it makes sense > to allow for as free modularity of JVM features as possible. Certainly > not all combinations are a good idea, and we are most definitely not > going to test all combinations, but I also don't think the build > should actively prevent anyone from experimentally exclude certain > "features". I would imagine this kind of freedom being useful in > certain development scenarios. Yes, that's exactly the intention. And on the contrary, if the discussion on this patch ends up in the verdict from the hotspot developers that it is not possible to disable serialgc, then the configure script should reflect that, and disallow deselecting it. In fact, it should not really even be a "JVM feature" then, just an always-on GC. And the check that Stefan Karlsson added, that at least one GC is selected, is unnecessary. >> Nits: >> >> *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: >> ? - Epsilon should not ever be selected by ergonomics >> ? - Why ZGC is selected before Shenandoah? [Oh, what a can of worms >> that one is ;)] > > This fallback list is clearly just meant to allow for any combination > of GCs being compiled into the JVM. If the only one you picked was > epsilon, then what other default would you expect? It's last in the > list so any other GC will still be prioritized before it if present. > For the same reason, the order of ZGC and Shenandoah is irrelevant and > could just as well be the other way. It will never have any real > consequence. This code is only there to keep things from falling apart > when a non standard combination of jvm features is picked. Exactly. For good measure, I can surely put Shenandoah before ZGC. :) The idea behind the added defaults as fallback is just to allow the JVM to even start if Serial GC is not present. If you configure with SerialGC (which, as you note, is the typical case), this change will not affect anything. Without this, it is not even possible to complete the build without SerialGC, since jlink cannot run. /Magnus > > /Erik > >> *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp >> ? - I don't think we indent nested #include, #define lines Ok, sorry about that. That was the style of choice last time I programmed something seriously in C, so I just defaulted to it. I'll fix it. /Magnus From magnus.ihse.bursie at oracle.com Tue Mar 10 14:51:54 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:51:54 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: On 2020-03-09 10:10, David Holmes wrote: > Hi Magnus, > > On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >> When reworking the JVM feature handling, I wanted to try to compile >> Hotspot with various features enabled/disabled. I quickly found out >> that it's not really possible to build hotspot without the serial gc. >> While this is not a terribly important use case, I think it's good to >> be able to select serial freely, just as with the other collectors. > > Really not sure this is a worthwhile exercise. While I agree that it is not very much important per se to build Hotspot without the Serial GC, I do want to make sure that we uphold the promise that configure fails early if you try to build with invalid options. So it's either not allowing configure to let you to build without the Serial GC, or it's fixing Hotspot so that it can build without it. My judgement was that the fixes required to make this work was minimal, without any impact to scenarios that *do* include Serial GC, and thus it was "worthwile" to fix this in Hotspot, rather than to make a limitation in the configure script. >> With this patch it is possible to build a truly minimal JVM using >> 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 > > > make/ModuleTools.gmk > > ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) > $(JAVA_FLAGS_SMALL_BUILDJDK) \ > > that should be BUILDJDK_JAVA_FLAGS_SMALL. Good catch! I renamed this at the very last moment, but missed this. :-( > > > make/RunTestsPrebuiltSpec.gmk > make/autoconf/boot-jdk.m4 > > ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 > > Depending on the default GC those -Xms and -Xmx settings may not be > valid/possible. Eh, okaaaay, this is not really something new, we're already setting this for the buildjdk. The only difference is that we mis-used the JAVA_FLAGS_SMALL variable, that was technically only valid for the bootjdk. So we have not seen any issues with this in practice. I'm still a bit worried though that you say that this might not work. How can the -Xms/mx values be invalid? > > Other changes seem okay but I'll leave it for GC folk to comment on that. Thanks for the review! /Magnus > > Cheers, > David > > >> >> /Magnus From magnus.ihse.bursie at oracle.com Tue Mar 10 14:53:31 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:53:31 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> Message-ID: On 2020-03-09 21:37, Kim Barrett wrote: >> On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie wrote: >> >> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. >> >> With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >> WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >> >> /Magnus > I'm inclined to agree with David and Aleksey that this isn't really a > worthwhile exercise. Especially not if it involves making some > otherwise questionable or controversial changes. As I've said in the previous comments, it's not so much about making Hotspot running without Serial GC as making configure live up to it's promise not to create an un-buildable configuration. I apologize if my changes are questionable or controversial -- my assessment was on the contrary that they were simple and non-obtrusive, to the point of triviality. > > In addition to the issues mentioned by David and Aleksey: > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/gcConfig.cpp > > I would instead suggest there should not be a default at all instead > of adding these cases, and the user must explicitly select the GC to > be used. Since we're talking about an atypical custom build anyway, > the user presumably knows what they are doing. And yeah, that makes > the buildjdk stuff elsewhere in this patch harder. If you build without the Serial GC, it is not even possible to start the JVM without a flag selecting GC. Instead you get a somewhat cryptic (and incorrect) message about missing garbage collectors. Even if the end user would be able to know that you need to pass an additional option just to be able to start java, the build system knows no such thing, so we cannot even finish the build -- as soon as we try to use the newly built JVM (e.g. for running jlink), we will crash and burn. > Really, I think this ought to just be left alone, along with most of > the other build-specific changes. > > [This also responds to / agrees with Aleksey's comment about this part.] > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/genCollectedHeap.cpp > 197 #if INCLUDE_SERIALGC > 198 MarkSweep::initialize(); > 199 #endif > > This whole file, and several associated files, are *only* used by > SerialGC now that CMS has been removed: JDK-8234502. Then maybe they should be excluded when serial is not included? Or, if it is determined that Serial GC is essential to hotspot, we should remove the INCLUDE_SERIALGC define and associated framework, since it's just a fake abstraction if it is not actually possible to build without serial GC. > > ------------------------------------------------------------------------------ > make/hotspot/lib/JvmFeatures.gmk > 58 ifeq ($(JVM_VARIANT), custom) > 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" > 60 endif > > This change looks unrelated to whether serialgc is present or absent. > If so, it doesn't belong in this changeset at all. You are correct that this is not strictly about serialgc. When I tested my custom build with only epsilongc, I discovered that jtreg barfed on the version string produced by the custom JVM build. This is a fix that makes sure the VMTYPE always has a value. If you object to me pushing it as part of this fix, I can remove it from here and submit it as a separate issue. (I just didn't think it was worth the hassle.) > > ------------------------------------------------------------------------------ > make/hotspot/lib/JvmFeatures.gmk > [removed] > 154 # If serial is disabled, we cannot use serial as OldGC in parallel > 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp > > This was missed by JDK-8235860, which removed those files. Good find. ... but according to your comment above, that fix also missed to add a bunch of other files that should be excluded..? (If we should keep the ability to disable serial gc, that is...) > > ------------------------------------------------------------------------------ > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp > > As originally written, this test was *only* testing SerialGC. It's not > obvious that it is actually GC-agnostic and can use the default GC if > that isn't SerialGC. Certainly some of the naming suggests otherwise. > Was this tested with all the other configurations? No, I have not tested all other configurations. I verified that I could build with only g1, only zgc and only epsilongc. I also tested to run tier1 testing, and it "mostly" succeeded, but it still failed on several tests. My quick eyeballing of the situation indicated that the absolute majority (and perhaps all) these failures were related to jtreg tests not properly declaring their dependencies on compiler1 or compiler2. (Remember, on this bare-bones JVM I only had the interpreter, and neither c1 nor c2). I *could* of course run a suitable set of testing with say c1 and c2 enabled, and just a single gc enabled, for the set of all gcs != serial gc, but then we're *really* getting into the "not worth it" land. It is not clear to me that the test is only run with Serial GC. As far as I can interpret the test framework, this is run with the default collector, which typically is *not* serialgc on our testing framework. If this is only valid for Serial GC, perhaps the test needs to be amended? /Magnus > > ------------------------------------------------------------------------------ > From per.liden at oracle.com Tue Mar 10 17:19:48 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 10 Mar 2020 18:19:48 +0100 Subject: RFR: 8240714: ZGC: TestSmallHeap.java failed due to OutOfMemoryError Message-ID: <74b2b7a0-fa16-7476-809a-fc550b4827d0@oracle.com> The gc/z/TestSmallHeap.java test failed once due to OutOfMemoryError. When using a 8M heap, this test is fairly sensitive in the sense that the heap will be very crowded and the heap headroom is small. When running as "main/othervm" there are additional jtreg threads running in the VM. These threads can apparently (sometimes?) allocate enough memory to disturb the test itself, pushing it over the edge with OOME as a result. To avoid having these threads running in the same VM as the test itself I've adjusted the test to spawn a new test VM through ProcessTools. Webrev: http://cr.openjdk.java.net/~pliden/8240714/webrev.0 Bug: https://bugs.openjdk.java.net/browse/JDK-8240714 Testing: Manual cheers, Per From sangheon.kim at oracle.com Tue Mar 10 21:17:56 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 10 Mar 2020 14:17:56 -0700 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> Hi Kim, On 3/3/20 6:16 PM, Kim Barrett wrote: > Please review this change to the handling of "padding" for the threshold > used to decide whether a mutator thread should perform concurrent > refinement. Rather than doing a slightly tricky (because of potential > overflow) computation every time a mutator thread completes a buffer, > instead perform that computation once and record the result for repeated > use. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239825 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ Looks good as is. ------------------- src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp ?326?? // Artificially increase mutator refinement threshold. ?327?? void set_max_cards_padding(size_t padding); - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. ?330?? void discard_max_cards_padding(); - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. ?? Is this intended? I don't need a new webrev for this change if you would like to change it. Thanks, Sangheon > > Testing: > mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with just this change. > From kim.barrett at oracle.com Tue Mar 10 21:27:50 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 17:27:50 -0400 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: References: Message-ID: > On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: > > Please review this change to the handling of completed buffers by mutator > threads. Previously it would conditionally process and potentially reuse the > buffer, rather than enqueuing it. Now, always enqueue the buffer and > allocate a new one, and conditionally process the next (oldest) dirty buffer > in the DCQS. The benefit of this is that the buffers being processed by the > mutator age for a while in the DCQS (just as is done by for concurrent > refinement thread processing), so if the mutator is making repeated writes > to the same or nearby locations, the associated card marking has more > opportunaty to be filtered out. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8139652 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ > > Testing > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. The original webrev was based on JDK-8239825 and JDK-8240133. The push and backout of JDK-8240133 has made that webrev no longer apply cleanly. So here's a new, up to date (as of this morning) webrev: https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ Tested with mach5 tier1-5 along with change for JDK-8239825 (which hasn't been pushed yet). I forgot to mention previously that I've also done some performance testing, which didn't find anything interesting from this change. Compared each of before/after this change plus each of default -XX:G1ConcRefinementThreads and that option = 0. From david.holmes at oracle.com Tue Mar 10 23:22:41 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Mar 2020 09:22:41 +1000 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> On 11/03/2020 12:51 am, Magnus Ihse Bursie wrote: > On 2020-03-09 10:10, David Holmes wrote: >> Hi Magnus, >> >> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>> When reworking the JVM feature handling, I wanted to try to compile >>> Hotspot with various features enabled/disabled. I quickly found out >>> that it's not really possible to build hotspot without the serial gc. >>> While this is not a terribly important use case, I think it's good to >>> be able to select serial freely, just as with the other collectors. >> >> Really not sure this is a worthwhile exercise. > While I agree that it is not very much important per se to build Hotspot > without the Serial GC, I do want to make sure that we uphold the promise > that configure fails early if you try to build with invalid options. > > So it's either not allowing configure to let you to build without the > Serial GC, or it's fixing Hotspot so that it can build without it. My > judgement was that the fixes required to make this work was minimal, > without any impact to scenarios that *do* include Serial GC, and thus it > was "worthwile" to fix this in Hotspot, rather than to make a limitation > in the configure script. I'm more inclined to say that SerialGC is not a VM feature per-se but rather an always present built-in GC. But I'll leave that to the GC folk. >>> With this patch it is possible to build a truly minimal JVM using >>> 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >>> WebRev: >>> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >> >> >> make/ModuleTools.gmk >> >> ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) >> $(JAVA_FLAGS_SMALL_BUILDJDK) \ >> >> that should be BUILDJDK_JAVA_FLAGS_SMALL. > Good catch! I renamed this at the very last moment, but missed this. :-( > >> >> >> make/RunTestsPrebuiltSpec.gmk >> make/autoconf/boot-jdk.m4 >> >> ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 >> >> Depending on the default GC those -Xms and -Xmx settings may not be >> valid/possible. > Eh, okaaaay, this is not really something new, we're already setting > this for the buildjdk. The only difference is that we mis-used the > JAVA_FLAGS_SMALL variable, that was technically only valid for the > bootjdk. So we have not seen any issues with this in practice. I'm still > a bit worried though that you say that this might not work. How can the > -Xms/mx values be invalid? Previously these heap sizes were associated with use of SerialGC: ! JAVA_FLAGS_SMALL := -XX:+UseSerialGC -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 now you are setting them independent of a particular GC. It may be possible that with some GC's the specified heap size is not sufficient to allow the build task to complete without getting an OutOfMemoryError. As an extreme case consider if you only have EpsilonGC configured. These values would need to be tested with each GC to see if the build tasks can be done with these settings. Also I'm not at all clear what happens if the only GC configured is one of the experimental GCs for which we would normally have to set -XX:+UnlockExperimentalVMOptions ?? Cheers, David ----- >> >> Other changes seem okay but I'll leave it for GC folk to comment on that. > Thanks for the review! > > /Magnus >> >> Cheers, >> David >> >> >>> >>> /Magnus > From kim.barrett at oracle.com Tue Mar 10 23:35:58 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 19:35:58 -0400 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> Message-ID: <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> > On Mar 10, 2020, at 5:17 PM, sangheon.kim at oracle.com wrote: > > Hi Kim, > > On 3/3/20 6:16 PM, Kim Barrett wrote: >> Please review this change to the handling of "padding" for the threshold >> used to decide whether a mutator thread should perform concurrent >> refinement. Rather than doing a slightly tricky (because of potential >> overflow) computation every time a mutator thread completes a buffer, >> instead perform that computation once and record the result for repeated >> use. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8239825 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ > Looks good as is. Thanks, but see below. > src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp > 326 // Artificially increase mutator refinement threshold. > 327 void set_max_cards_padding(size_t padding); > - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. It's called twice by G1ConcurrentRefine::adjust, at g1ConcurrentRefine.cpp:404/406. Those lines didn't need to be changed, because the functional behavior didn't change, just the underlying implementation; see below. > 330 void discard_max_cards_padding(); > - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. > Is this intended? I don't need a new webrev for this change if you would like to change it. I didn't change the name of a data member, I removed one and added the other; those two members have completely different semantics. The old _max_cards_padding was the current amount of padding. The effective threshold was _max_cards + _max_cards_padding, being careful to deal with overflow. The new _padded_max_cards is the effective threshold, recomputed when either _max_cards or the padding value is updated (being careful to deal with overflow at that update time, rather than every time the threshold is needed). It has no accessor functions; it is a private implementation detail, only used internally, where it is used directly as a data member. set_max_cards_padding(new_padding) still changes the current padding value. With this change that padding value is no longer directly recorded in a data member. Instead the padded threshold is computed and recorded in the new data member (_padded_max_cards). The ability to make these kinds of implementation changes without changing the external API is kind of the point of using a functional interface rather than exposing data members to clients. I think the name "set_max_cards_padding" doesn't (and shouldn't) imply anything about the existence (or not) of a _max_cards_padding member. I also don't think the public function name should be changed to "set_padded_max_cards" to reflect the new member name, whose very existence is an implementation detail. The name could perhaps be changed to update_max_cards_padding, but I don't think that's really an improvement. What do others think. From sangheon.kim at oracle.com Wed Mar 11 00:53:57 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 10 Mar 2020 17:53:57 -0700 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> Message-ID: <874f2180-9024-f9dc-748a-9b51a44fd298@oracle.com> On 3/10/20 4:35 PM, Kim Barrett wrote: >> On Mar 10, 2020, at 5:17 PM, sangheon.kim at oracle.com wrote: >> >> Hi Kim, >> >> On 3/3/20 6:16 PM, Kim Barrett wrote: >>> Please review this change to the handling of "padding" for the threshold >>> used to decide whether a mutator thread should perform concurrent >>> refinement. Rather than doing a slightly tricky (because of potential >>> overflow) computation every time a mutator thread completes a buffer, >>> instead perform that computation once and record the result for repeated >>> use. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8239825 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ >> Looks good as is. > Thanks, but see below. > >> src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp >> 326 // Artificially increase mutator refinement threshold. >> 327 void set_max_cards_padding(size_t padding); >> - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. > It's called twice by G1ConcurrentRefine::adjust, at > g1ConcurrentRefine.cpp:404/406. Those lines didn't need to be changed, > because the functional behavior didn't change, just the underlying > implementation; see below. Right, I thought that method is modified but it isn't. > >> 330 void discard_max_cards_padding(); >> - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. >> Is this intended? I don't need a new webrev for this change if you would like to change it. > I didn't change the name of a data member, I removed one and added the > other; those two members have completely different semantics. I don't want argue with this, because removing/adding vs. changing its name/semantics seems same to me. :) > > The old _max_cards_padding was the current amount of padding. The > effective threshold was _max_cards + _max_cards_padding, being careful > to deal with overflow. > > The new _padded_max_cards is the effective threshold, recomputed when > either _max_cards or the padding value is updated (being careful to > deal with overflow at that update time, rather than every time the > threshold is needed). It has no accessor functions; it is a private > implementation detail, only used internally, where it is used directly > as a data member. > > set_max_cards_padding(new_padding) still changes the current padding > value. With this change that padding value is no longer directly > recorded in a data member. Instead the padded threshold is computed > and recorded in the new data member (_padded_max_cards). The ability > to make these kinds of implementation changes without changing the > external API is kind of the point of using a functional interface > rather than exposing data members to clients. > > I think the name "set_max_cards_padding" doesn't (and shouldn't) imply > anything about the existence (or not) of a _max_cards_padding member. > I also don't think the public function name should be changed to > "set_padded_max_cards" to reflect the new member name, whose very > existence is an implementation detail. > > The name could perhaps be changed to update_max_cards_padding, but I > don't think that's really an improvement. What do others think. It is straightforward to me that the old method was just setter so reflected the member name. However if you intended to name/implement it with the concept of a functional interface, that is totally fine with me. Hopefully I'm the only person who is questionable on the method name. :) Thanks, Sangheon > From kim.barrett at oracle.com Wed Mar 11 01:36:16 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 21:36:16 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> Message-ID: <96531866-D45A-4797-841D-3E6E26F403D5@oracle.com> > On Mar 10, 2020, at 10:53 AM, Magnus Ihse Bursie wrote: > > On 2020-03-09 21:37, Kim Barrett wrote: >>> On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie >>> wrote: >>> >>> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. >>> >>> With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8240224 >>> >>> WebRev: >>> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >>> >>> >>> /Magnus >>> >> I'm inclined to agree with David and Aleksey that this isn't really a >> worthwhile exercise. Especially not if it involves making some >> otherwise questionable or controversial changes. >> > > As I've said in the previous comments, it's not so much about making Hotspot running without Serial GC as making configure live up to it's promise not to create an un-buildable configuration. The ability to configure which GCs are present was added for several reasons. Some packagers don't want to support some of the collectors that are available in the source tree, so want to completely exclude the (to them) unsupported collectors from their builds. Some packagers want to be able to reduce the VM footprint for certain application areas; the "minimal" variant is an example. In preparation for removal of CMS it was useful to first be able to build with it configured out. And CMS could have ended up in the category of collectors that are excluded as unsupported by some packagers. The implementation of this configurability tried to be reasonably complete. Doing so helped shake out problems and show the intent. But I don't know if it was ever demonstrated to work for all possibilities, and even if it did at one time, bit rot is pretty much inevitable since we don't test most of those possibilities. I don't think we should be spending effort on configurations for which there is no evidence anyone actually wants or needs them. But having the mechanism in the build system to try a configuration provides a starting point if someone finds a need for something oddball, even if it doesn't work out of the box. It would be better if broken configurations failed nicely, but even that can't be ensured for long without ongoing testing that I don't think anyone wants to do. > I apologize if my changes are questionable or controversial -- my assessment was on the contrary that they were simple and non-obtrusive, to the point of triviality. Some of the discussion in this thread has been pointing out places where a reviewer thinks that assessment is mistaken. >> src/hotspot/share/gc/shared/gcConfig.cpp >> >> I would instead suggest there should not be a default at all instead >> of adding these cases, and the user must explicitly select the GC to >> be used. Since we're talking about an atypical custom build anyway, >> the user presumably knows what they are doing. And yeah, that makes >> the buildjdk stuff elsewhere in this patch harder. >> > > If you build without the Serial GC, it is not even possible to start the JVM without a flag selecting GC. Instead you get a somewhat cryptic (and incorrect) message about missing garbage collectors. Even if the end user would be able to know that you need to pass an additional option just to be able to start java, the build system knows no such thing, so we cannot even finish the build -- as soon as we try to use the newly built JVM (e.g. for running jlink), we will crash and burn. Right, because the build system isn't dealing with the need to explicitly specify the GC to use in such a configuration. That's what I meant about making the build stuff harder. The build system would need to look at the configuration to decide how to accomplish the build. >> src/hotspot/share/gc/shared/genCollectedHeap.cpp >> 197 #if INCLUDE_SERIALGC >> 198 MarkSweep::initialize(); >> 199 #endif >> >> This whole file, and several associated files, are *only* used by >> SerialGC now that CMS has been removed: JDK-8234502. >> > > Then maybe they should be excluded when serial is not included? That would be part of the work involved in resolving JDK-8234502. > Or, if it is determined that Serial GC is essential to hotspot, we should remove the INCLUDE_SERIALGC define and associated framework, since it's just a fake abstraction if it is not actually possible to build without serial GC. I don?t think there is any belief that SerialGC must always be included. That it can?t currently be excluded is an artifact of nobody having the need and the resources to make that possible. >> make/hotspot/lib/JvmFeatures.gmk >> 58 ifeq ($(JVM_VARIANT), custom) >> 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" >> 60 endif >> >> This change looks unrelated to whether serialgc is present or absent. >> If so, it doesn't belong in this changeset at all. >> > > You are correct that this is not strictly about serialgc. When I tested my custom build with only epsilongc, I discovered that jtreg barfed on the version string produced by the custom JVM build. This is a fix that makes sure the VMTYPE always has a value. If you object to me pushing it as part of this fix, I can remove it from here and submit it as a separate issue. (I just didn't think it was worth the hassle.) I understand there is overhead to breaking things into multiple changes, but combining unrelated changes can make archeology and problem or rationale attribution much harder. I looked at this and had no idea what it was for, and it wasn't called out in the RFR or anywhere else. >> make/hotspot/lib/JvmFeatures.gmk >> [removed] >> 154 # If serial is disabled, we cannot use serial as OldGC in parallel >> 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp >> >> This was missed by JDK-8235860, which removed those files. Good find. >> > ... but according to your comment above, that fix also missed to add a bunch of other files that should be excluded..? (If we should keep the ability to disable serial gc, that is?) The comment above was about a different change, the removal of CMS, which is known to be incomplete and have a number of further cleanups and refactorings to do before all vestiges have been removed. This one is about the removal of the Serial-Old variant of ParallelGC, which was thought to be complete, but missed this little snippet. >> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp >> >> As originally written, this test was *only* testing SerialGC. It's not >> obvious that it is actually GC-agnostic and can use the default GC if >> that isn't SerialGC. Certainly some of the naming suggests otherwise. >> Was this tested with all the other configurations? >> > > No, I have not tested all other configurations. I verified that I could build with only g1, only zgc and only epsilongc. I also tested to run tier1 testing, and it "mostly" succeeded, but it still failed on several tests. My quick eyeballing of the situation indicated that the absolute majority (and perhaps all) these failures were related to jtreg tests not properly declaring their dependencies on compiler1 or compiler2. (Remember, on this bare-bones JVM I only had the interpreter, and neither c1 nor c2). > > I *could* of course run a suitable set of testing with say c1 and c2 enabled, and just a single gc enabled, for the set of all gcs != serial gc, but then we're *really* getting into the "not worth it" land. > > It is not clear to me that the test is only run with Serial GC. As far as I can interpret the test framework, this is run with the default collector, which typically is *not* serialgc on our testing framework. If this is only valid for Serial GC, perhaps the test needs to be amended? Looking at this some more, I don't know what this test thinks it's doing, but I suspect it's confused. It's using TEST_VM and TEST_OTHER_VM, both of which create the VM before running the test body. The kinds of things it's doing in that context seem pretty questionable. From kim.barrett at oracle.com Wed Mar 11 01:39:56 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 18:39:56 -0700 (PDT) Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> Message-ID: > On Mar 10, 2020, at 7:22 PM, David Holmes wrote: > Also I'm not at all clear what happens if the only GC configured is one of the experimental GCs for which we would normally have to set -XX:+UnlockExperimentalVMOptions ?? Yes, it seems wrong to ever select an experimental GC by default. From kim.barrett at oracle.com Wed Mar 11 01:57:15 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 21:57:15 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> Message-ID: > On Mar 10, 2020, at 10:51 AM, Magnus Ihse Bursie wrote: >>> Nits: >>> >>> *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: >>> - Epsilon should not ever be selected by ergonomics >>> - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] >> >> This fallback list is clearly just meant to allow for any combination of GCs being compiled into the JVM. If the only one you picked was epsilon, then what other default would you expect? It's last in the list so any other GC will still be prioritized before it if present. For the same reason, the order of ZGC and Shenandoah is irrelevant and could just as well be the other way. It will never have any real consequence. This code is only there to keep things from falling apart when a non standard combination of jvm features is picked. > Exactly. For good measure, I can surely put Shenandoah before ZGC. :) Whichever one is placed first will likely annoy the folks behind the competing second. There?s no way to win this one. > The idea behind the added defaults as fallback is just to allow the JVM to even start if Serial GC is not present. If you configure with SerialGC (which, as you note, is the typical case), this change will not affect anything. Without this, it is not even possible to complete the build without SerialGC, since jlink cannot run. The is_server_class_machine() test is intended to filter out collectors that (probably) don?t make sense to run on ?small? machines. (Admittedly, it?s not so easy to buy a computer that doesn?t qualify for is_server_class_machine() anymore, outside of the embedded space, and even there?) But we let one insist by allowing the default to be overridden by an explicit selection. From shade at redhat.com Wed Mar 11 12:23:50 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Mar 2020 13:23:50 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles Message-ID: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> RFR: https://bugs.openjdk.java.net/browse/JDK-8240868 See the rationale in description. Webrev: https://cr.openjdk.java.net/~shade/8240868/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 11 12:37:12 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 13:37:12 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> Message-ID: <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> Hi Aleksey, Very nice! I see you haven't touched conc-mark. While updating-on-mark is still used by full-GC, there should be a couple of paths that are unused now (e.g. in the init-mark parts), do you intend to (carefully) remove them in a follow-up? Also, the block here: http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 is not needed anymore, either. It's only there to conclude the GC cycle, in the case where the cycle officially (and awkwardly) ends at final-mark. (We'll probably find more orphaned little blocks related to this in the future.) Other than that, it's good. Roman > RFR: > https://bugs.openjdk.java.net/browse/JDK-8240868 > > See the rationale in description. > > Webrev: > https://cr.openjdk.java.net/~shade/8240868/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From shade at redhat.com Wed Mar 11 12:37:44 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Mar 2020 13:37:44 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> Message-ID: <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> On 3/11/20 1:37 PM, Roman Kennke wrote: > I see you haven't touched conc-mark. While updating-on-mark is still > used by full-GC, there should be a couple of paths that are unused now > (e.g. in the init-mark parts), do you intend to (carefully) remove them > in a follow-up? Yes, that is the plan: go through all uses of has_forwarded_objects in marking code and see if is used somewhere else. If not, those should be removed. > Also, the block here: > > http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 > > is not needed anymore, either. It's only there to conclude the GC cycle, > in the case where the cycle officially (and awkwardly) ends at final-mark. Yes, that is one of the follow-ups. -- Thanks, -Aleksey From zgu at redhat.com Wed Mar 11 12:45:13 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 08:45:13 -0400 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> Message-ID: <3dbfc339-47e3-d39b-55c5-bbd2073662b2@redhat.com> Yes, I like it. Looks good to me. Thanks, -Zhengyu On 3/11/20 8:23 AM, Aleksey Shipilev wrote: > RFR: > https://bugs.openjdk.java.net/browse/JDK-8240868 > > See the rationale in description. > > Webrev: > https://cr.openjdk.java.net/~shade/8240868/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From rkennke at redhat.com Wed Mar 11 12:53:19 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 13:53:19 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> Message-ID: On 3/11/20 1:37 PM, Aleksey Shipilev wrote: > On 3/11/20 1:37 PM, Roman Kennke wrote: >> I see you haven't touched conc-mark. While updating-on-mark is still >> used by full-GC, there should be a couple of paths that are unused now >> (e.g. in the init-mark parts), do you intend to (carefully) remove them >> in a follow-up? > > Yes, that is the plan: go through all uses of has_forwarded_objects in marking code and see if is > used somewhere else. If not, those should be removed. > >> Also, the block here: >> >> http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 >> >> is not needed anymore, either. It's only there to conclude the GC cycle, >> in the case where the cycle officially (and awkwardly) ends at final-mark. > > Yes, that is one of the follow-ups. Ok, good then. Thanks, Roman From rkennke at redhat.com Wed Mar 11 19:54:17 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 20:54:17 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers Message-ID: The strong invariant gives us an opportunity to short-cut arraycopy-barriers: - if the src object is beyond the safe-iteration limit, e.g. has been allocated since evac-start, then it can not have any from-space references and thus does not require updating. - likewise, if the dst object is beyond TAMS, e.g. has been allocated since mark-start, then it can only have references that must have been reachable otherwise and thus don't require enqueueing in SATB. Short-cutting on those condition cuts out 80-90% of arraycopy slowpaths. It also brings in the closing of update-watermark after updating one region is finished, originally proposed in "8240872: Shenandoah: Avoid updating new regions from start of evacuation", but now with a fence to ensure that preceding updates of regions are indeed visible to threads before they see the watermark going down. Bug: https://bugs.openjdk.java.net/browse/JDK-8240873 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ Testing: hotspot_gc_shenandoah, specjbb2015, some specjvm workloads Can I please get a review? Thanks, Roman From zgu at redhat.com Wed Mar 11 21:43:16 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 17:43:16 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> Message-ID: <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Revised based on offline discussions. Piggyback on stack code root rescanning to SATB draining task. Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ Reran tests: hotspot_gc_shenandoah tools/javac Thanks, -Zhengyu On 3/4/20 6:06 PM, Zhengyu Gu wrote: > Traversal GC has the same issue, also need to remark on stack code roots > in final traversal. > > @@ -263,11 +263,12 @@ > ???? if (!_heap->is_degenerated_gc_in_progress()) { > ?????? ShenandoahTraversalRootsClosure roots_cl(q, rp); > ?????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); > ?????? if (unload_classes) { > ???????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); > -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, > NULL, &tc); > +??????? MarkingCodeBlobClosure code_cl(&roots_cl, > CodeBlobToOopClosure::FixRelocations); > +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, > &code_cl, &tc); > ?????? } else { > ???????? CLDToOopClosure cld_cl(&roots_cl, > ClassLoaderData::_claim_strong); > ???????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); > ?????? } > ???? } else { > > > Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ > > Thank, > > -Zhengyu > > On 2/25/20 12:13 PM, Zhengyu Gu wrote: >> Shenandoah encounters a few test failures with tools/javac. Verifier >> catches unmarked oops in nmethod's metadata during root evacuation in >> final mark phase. >> >> The problem is that, Shenandoah marks on stack nmethods in init mark >> pause, but it does not mark nmethod's metadata during concurrent mark >> phase, when new nmethod is about to be executed. >> >> The solution: >> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the >> nmethod is about to be executed, when nmethod entry barrier is supported. >> >> 2) Remark on stack nmethod's metadata at final mark pause. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >> >> Test: >> ?? hotspot_gc_shenandoah (fastdebug and release) >> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >> release) >> >> Thanks, >> >> -Zhengyu From adityam at microsoft.com Thu Mar 12 01:40:32 2020 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 12 Mar 2020 01:40:32 +0000 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8231668 Webrev: https://cr.openjdk.java.net/~adityam/8231668/ This removes all the ForceDynamicNumberOfGCThreads-related code and the test cases using it. Note: this is my first patch since getting Author status, so please feel free to let me know if there's anything wrong with how I created the webrev. Other friendly folks have been doing that for me until now :). Thanks, Aditya From zgu at redhat.com Thu Mar 12 01:44:08 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 21:44:08 -0400 Subject: [15] RFR (T) 8240915: Shenandoah: Remove unused fields in init mark tasks Message-ID: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> Please review this trivial change that removes unused fields in ShenandoahInitTraversalCollectionTask and ShenandoahInitMarkRootsTask. Bug: https://bugs.openjdk.java.net/browse/JDK-8240915 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240915/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From shade at redhat.com Thu Mar 12 05:45:31 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 06:45:31 +0100 Subject: [15] RFR (T) 8240915: Shenandoah: Remove unused fields in init mark tasks In-Reply-To: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> References: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> Message-ID: On 3/12/20 2:44 AM, Zhengyu Gu wrote: > Please review this trivial change that removes unused fields in > ShenandoahInitTraversalCollectionTask and ShenandoahInitMarkRootsTask. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240915 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240915/webrev.00/ Looks good to me. I wonder (idly) when did they stopped being used. -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 06:09:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 07:09:28 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: Message-ID: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> On 3/12/20 2:40 AM, Aditya Mandaleeka wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8231668 > > Webrev: > https://cr.openjdk.java.net/~adityam/8231668/ This looks good to me. Stylistic nits: *) Conditions like this: 865 if (!UseDynamicNumberOfGCThreads || 866 !FLAG_IS_DEFAULT(ConcGCThreads)) { ...can now be written like: if (!UseDynamicNumberOfGCThreads || !FLAG_IS_DEFAULT(ConcGCThreads)) { *) Please update copyrights to mention 2020. For example, in workerPolicy.cpp: Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. > Note: this is my first patch since getting Author status, so please feel free to let me know if there's > anything wrong with how I created the webrev. Other friendly folks have been doing that for me > until now :). Many of us use "mq" extension to stash the patches. The upside with webrevs would be that you can add the changeset description (hg qrefresh -e) right to the patch, and then webrev would pick it up. It would also generate the changeset itself, so sponsors would just download it and push on your behalf. Metadata for this change is something like: 8231668: Remove ForceDynamicNumberOfGCThreads Reviewed-by: XXX (list of reviewers from census) Contributed-by: Aditya Mandaleeka -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 08:31:30 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 09:31:30 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References: Message-ID: On 3/11/20 8:54 PM, Roman Kennke wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240873 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? *) This bit is incorrect, should be set_update_watermark: 2426 r->set_concurrent_iteration_safe_limit(r->bottom()); -- Thanks, -Aleksey From sgehwolf at redhat.com Thu Mar 12 09:30:56 2020 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Thu, 12 Mar 2020 10:30:56 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> Message-ID: On Thu, 2020-03-12 at 07:09 +0100, Aleksey Shipilev wrote: > Metadata for this change is something like: > > 8231668: Remove ForceDynamicNumberOfGCThreads > Reviewed-by: XXX (list of reviewers from census) > Contributed-by: Aditya Mandaleeka For authors, 'Contributed-by:' line would not be necessary, no? They could just use "hg commit -u ". That's my understanding anyhow. Thanks, Severin From shade at redhat.com Thu Mar 12 09:43:12 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 10:43:12 +0100 Subject: RFR (S) 8240948: Shenandoah: cleanup not-forwarded-objects paths after JDK-8240868 Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240948 Unfortunately, not much code can be eliminated from conc-mark, because Full GC (and Traversal?) share some of that code. Webrev: https://cr.openjdk.java.net/~shade/8240948/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 12 11:04:31 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 12:04:31 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References:

Message-ID: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> On 3/12/20 9:31 AM, Aleksey Shipilev wrote: > On 3/11/20 8:54 PM, Roman Kennke wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8240873 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ > > *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? > Hmm ok. This requires making the field. I am not sure if the cast in get_update_watermark() is ok? > *) This bit is incorrect, should be set_update_watermark: > > 2426 r->set_concurrent_iteration_safe_limit(r->bottom()); Hoops. Corrected. http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.02/ WDYT? From shade at redhat.com Thu Mar 12 11:04:42 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 12:04:42 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> References:

<58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> Message-ID: On 3/12/20 12:04 PM, Roman Kennke wrote: >> *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? > > Hmm ok. This requires making the field. I am not sure if the cast in > get_update_watermark() is ok? I don't quite understand why cast is needed. There are already _critical_pins and _live_data fields that are atomic, why can't we do the same? > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.02/ Looks fine, modulo nit above. -- Thanks, -Aleksey From thomas.schatzl at oracle.com Thu Mar 12 13:00:51 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 12 Mar 2020 06:00:51 -0700 (PDT) Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: Message-ID: <7d294441-3069-55e7-aa11-b6b20699a24a@oracle.com> Hi, On 12.03.20 02:40, Aditya Mandaleeka wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8231668 > > Webrev: > https://cr.openjdk.java.net/~adityam/8231668/ > > This removes all the ForceDynamicNumberOfGCThreads-related code and the test cases using it. > > Note: this is my first patch since getting Author status, so please feel free to let me know if there's > anything wrong with how I created the webrev. Other friendly folks have been doing that for me > until now :). in addition to what Aleksey said: - a comment in TestDynamicNumberOfGCThreads refers to a non-existent option "TraceDynamicGCThreads" - I think that that entire test does not test a lot as it only checks whether that log message is printed, but it does not check whether there is actually a dynamic change in number of gc threads over time. I think it can be removed. Feel free to file a separate CR for improving it or creating a new one, depending on whether you remove it or not. Thanks, Thomas From ivan.walulya at oracle.com Thu Mar 12 13:12:28 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 12 Mar 2020 14:12:28 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Message-ID: <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ > On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: > > Hi, > > On 05.03.20 11:33, Ivan Walulya wrote: >> Hi all, >> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >> //Ivan > > some minor (imo) comments to start a discussion: > > - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. > > - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. > > - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. > > So overall, I am good with the change but asking for opinions :) > > Thanks, > Thomas From zgu at redhat.com Thu Mar 12 13:23:22 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 09:23:22 -0400 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner Message-ID: Please review this small enhancement, that avoids scanning thread's code roots if we scan all code blobs anyway. Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From rkennke at redhat.com Thu Mar 12 14:12:01 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:12:01 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References:

<58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> Message-ID: <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> On 3/12/20 12:04 PM, Aleksey Shipilev wrote: > On 3/12/20 12:04 PM, Roman Kennke wrote: >>> *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? >> >> Hmm ok. This requires making the field. I am not sure if the cast in >> get_update_watermark() is ok? > > I don't quite understand why cast is needed. There are already _critical_pins and _live_data fields > that are atomic, why can't we do the same? > Turns out that: volatile HeapWord* _update_watermark; is not the same as: HeapWord* volatile _update_watermark; The former means 'a pointer to a volatile HeapWord', the latter 'a volatile pointer to a HeapWord'. We need the latter. Testing showed an occasional failure caused by piggy-backing updaterefs on marking: it would skip updating when taking the marking-shortcut. While it is not really relevant anymore, I changed the conditions a bit to not blindly return in the arraycopy-pre-barrier, but do the check for updating independently. Overall a good example why it was a good move to get rid of the piggy-backing. It causes more maintenance for no real benefit. http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.03/ Passes all tests in hotspot_gc_shenandoah Good now? Roman From rkennke at redhat.com Thu Mar 12 14:14:02 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:14:02 +0100 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: References: Message-ID: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> It's doing the same in both branches, or what am I missing? Roman On 3/12/20 2:23 PM, Zhengyu Gu wrote: > Please review this small enhancement, that avoids scanning thread's code > roots if we scan all code blobs anyway. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ > > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > > Thanks, > > -Zhengyu > From rkennke at redhat.com Thu Mar 12 14:20:56 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:20:56 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Message-ID: Hi Zhengyu, in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: + } else if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { + // Disarm nmethods that armed for concurrent mark. + // On normal code path (non-empty Cset), it depends on update_roots() to + // disarm nmethods in degenerated GC. + ShenandoahCodeRoots::disarm_nmethods(); beware that the update_roots() is only called at the end of update_refs phase. The same call at end of marking is orphaned since removal of piggy-backed marking. Otherwise looks good. Thanks, Roman On 3/11/20 10:43 PM, Zhengyu Gu wrote: > Revised based on offline discussions. > > Piggyback on stack code root rescanning to SATB draining task. > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ > > Reran tests: > ? hotspot_gc_shenandoah > ? tools/javac > > Thanks, > > -Zhengyu > > On 3/4/20 6:06 PM, Zhengyu Gu wrote: >> Traversal GC has the same issue, also need to remark on stack code >> roots in final traversal. >> >> @@ -263,11 +263,12 @@ >> ????? if (!_heap->is_degenerated_gc_in_progress()) { >> ??????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >> ??????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >> ??????? if (unload_classes) { >> ????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >> NULL, &tc); >> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >> CodeBlobToOopClosure::FixRelocations); >> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >> &code_cl, &tc); >> ??????? } else { >> ????????? CLDToOopClosure cld_cl(&roots_cl, >> ClassLoaderData::_claim_strong); >> ????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >> ??????? } >> ????? } else { >> >> >> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >> >> Thank, >> >> -Zhengyu >> >> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>> Shenandoah encounters a few test failures with tools/javac. Verifier >>> catches unmarked oops in nmethod's metadata during root evacuation in >>> final mark phase. >>> >>> The problem is that, Shenandoah marks on stack nmethods in init mark >>> pause, but it does not mark nmethod's metadata during concurrent mark >>> phase, when new nmethod is about to be executed. >>> >>> The solution: >>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>> the nmethod is about to be executed, when nmethod entry barrier is >>> supported. >>> >>> 2) Remark on stack nmethod's metadata at final mark pause. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) >>> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>> release) >>> >>> Thanks, >>> >>> -Zhengyu > From rkennke at redhat.com Thu Mar 12 14:24:46 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:24:46 +0100 Subject: RFR (S) 8240948: Shenandoah: cleanup not-forwarded-objects paths after JDK-8240868 In-Reply-To: References: Message-ID: <6f5c904e-06c5-8b87-857a-db6d8fd4e3c0@redhat.com> Looks good. Let's do the rest carefully. Full-GC requires updating refs by traversal because we may not be able to parse the heap sequentially. Traversal should be mostly-ok because it has its own set of closures to deal with updating refs. Thanks, Roman On 3/12/20 10:43 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240948 > > Unfortunately, not much code can be eliminated from conc-mark, because Full GC (and Traversal?) > share some of that code. > > Webrev: > https://cr.openjdk.java.net/~shade/8240948/webrev.01/ > > Testing: hotspot_gc_shenandoah > From zgu at redhat.com Thu Mar 12 15:01:17 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 11:01:17 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Message-ID: <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> Hi Roman, On 3/12/20 10:20 AM, Roman Kennke wrote: > Hi Zhengyu, > > in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: > + } else if > (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { > + // Disarm nmethods that armed for concurrent mark. > + // On normal code path (non-empty Cset), it depends on > update_roots() to > + // disarm nmethods in degenerated GC. > + ShenandoahCodeRoots::disarm_nmethods(); > > beware that the update_roots() is only called at the end of update_refs > phase. The same call at end of marking is orphaned since removal of > piggy-backed marking. I think it is fine, successful degenerated GC cycle should always execute update_refs, no? Thanks, -Zhengyu > > Otherwise looks good. > > Thanks, > Roman > > > On 3/11/20 10:43 PM, Zhengyu Gu wrote: >> Revised based on offline discussions. >> >> Piggyback on stack code root rescanning to SATB draining task. >> >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >> >> Reran tests: >> ? hotspot_gc_shenandoah >> ? tools/javac >> >> Thanks, >> >> -Zhengyu >> >> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>> Traversal GC has the same issue, also need to remark on stack code >>> roots in final traversal. >>> >>> @@ -263,11 +263,12 @@ >>> ????? if (!_heap->is_degenerated_gc_in_progress()) { >>> ??????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>> ??????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>> ??????? if (unload_classes) { >>> ????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>> NULL, &tc); >>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>> CodeBlobToOopClosure::FixRelocations); >>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>> &code_cl, &tc); >>> ??????? } else { >>> ????????? CLDToOopClosure cld_cl(&roots_cl, >>> ClassLoaderData::_claim_strong); >>> ????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>> ??????? } >>> ????? } else { >>> >>> >>> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>> >>> Thank, >>> >>> -Zhengyu >>> >>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>> catches unmarked oops in nmethod's metadata during root evacuation in >>>> final mark phase. >>>> >>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>> pause, but it does not mark nmethod's metadata during concurrent mark >>>> phase, when new nmethod is about to be executed. >>>> >>>> The solution: >>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>> the nmethod is about to be executed, when nmethod entry barrier is >>>> supported. >>>> >>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>> >>>> Test: >>>> ?? hotspot_gc_shenandoah (fastdebug and release) >>>> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>>> release) >>>> >>>> Thanks, >>>> >>>> -Zhengyu >> > From zgu at redhat.com Thu Mar 12 15:34:21 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 11:34:21 -0400 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> References: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> Message-ID: <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> Oops, copy/paste error, updated: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.01/ Reran hotspot_gc_shenandaoh tests Thanks, -Zhengyu On 3/12/20 10:14 AM, Roman Kennke wrote: > It's doing the same in both branches, or what am I missing? > > Roman > > On 3/12/20 2:23 PM, Zhengyu Gu wrote: >> Please review this small enhancement, that avoids scanning thread's code >> roots if we scan all code blobs anyway. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ >> >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) >> >> Thanks, >> >> -Zhengyu >> > From shade at redhat.com Thu Mar 12 16:14:04 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 17:14:04 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> References:

<58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> Message-ID: On 3/12/20 3:12 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.03/ OK, good. -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 16:27:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 17:27:23 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> Message-ID: <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> On 3/12/20 10:30 AM, Severin Gehwolf wrote: > On Thu, 2020-03-12 at 07:09 +0100, Aleksey Shipilev wrote: >> Metadata for this change is something like: >> >> 8231668: Remove ForceDynamicNumberOfGCThreads >> Reviewed-by: XXX (list of reviewers from census) >> Contributed-by: Aditya Mandaleeka > > For authors, 'Contributed-by:' line would not be necessary, no? They > could just use "hg commit -u ". That's my understanding > anyhow. I believe Contributed-by is cleaner and captures the reality better. If you look in the repo history, there are plenty of Contributed-by lines mentioning those who have author status. -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 12 16:43:05 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 17:43:05 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> Message-ID: <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> >> Hi Zhengyu, >> >> in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: >> +????? } else if >> (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { >> +??????? // Disarm nmethods that armed for concurrent mark. >> +??????? // On normal code path (non-empty Cset), it depends on >> update_roots() to >> +??????? // disarm nmethods in degenerated GC. >> +??????? ShenandoahCodeRoots::disarm_nmethods(); >> >> beware that the update_roots() is only called at the end of update_refs >> phase. The same call at end of marking is orphaned since removal of >> piggy-backed marking. > > I think it is fine, successful degenerated GC cycle should always > execute update_refs, no? Ok. I was only worried because the comment seems to imply it relies to update_roots() at the end of mark. Aleksey's patch is removing that. If update_roots() at the end of update_refs is good too, then fine. Thanks, Roman > Thanks, > > -Zhengyu > >> >> Otherwise looks good. >> >> Thanks, >> Roman >> >> >> On 3/11/20 10:43 PM, Zhengyu Gu wrote: >>> Revised based on offline discussions. >>> >>> Piggyback on stack code root rescanning to SATB draining task. >>> >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >>> >>> Reran tests: >>> ?? hotspot_gc_shenandoah >>> ?? tools/javac >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>>> Traversal GC has the same issue, also need to remark on stack code >>>> roots in final traversal. >>>> >>>> @@ -263,11 +263,12 @@ >>>> ?????? if (!_heap->is_degenerated_gc_in_progress()) { >>>> ???????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>>> ???????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>>> ???????? if (unload_classes) { >>>> ?????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>> NULL, &tc); >>>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>>> CodeBlobToOopClosure::FixRelocations); >>>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>> &code_cl, &tc); >>>> ???????? } else { >>>> ?????????? CLDToOopClosure cld_cl(&roots_cl, >>>> ClassLoaderData::_claim_strong); >>>> ?????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>>> ???????? } >>>> ?????? } else { >>>> >>>> >>>> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>>> >>>> Thank, >>>> >>>> -Zhengyu >>>> >>>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>>> catches unmarked oops in nmethod's metadata during root evacuation in >>>>> final mark phase. >>>>> >>>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>>> pause, but it does not mark nmethod's metadata during concurrent mark >>>>> phase, when new nmethod is about to be executed. >>>>> >>>>> The solution: >>>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>>> the nmethod is about to be executed, when nmethod entry barrier is >>>>> supported. >>>>> >>>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>>> >>>>> Test: >>>>> ??? hotspot_gc_shenandoah (fastdebug and release) >>>>> ??? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>>>> release) >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu >>> >> > From rkennke at redhat.com Thu Mar 12 16:43:59 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 17:43:59 +0100 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> References: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> Message-ID: <67950961-4f10-a00e-84b1-6c3ccbe99b31@redhat.com> Ok, makes more sense now. Looks good! Thank you! Roman > Oops, copy/paste error, updated: > > http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.01/ > > Reran hotspot_gc_shenandaoh tests > > Thanks, > > -Zhengyu > > On 3/12/20 10:14 AM, Roman Kennke wrote: >> It's doing the same in both branches, or what am I missing? >> >> Roman >> >> On 3/12/20 2:23 PM, Zhengyu Gu wrote: >>> Please review this small enhancement, that avoids scanning thread's code >>> roots if we scan all code blobs anyway. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ >>> >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) >>> >>> Thanks, >>> >>> -Zhengyu >>> >> > From stefan.johansson at oracle.com Thu Mar 12 17:58:09 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 12 Mar 2020 18:58:09 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: Hi Ivan, > 12 mars 2020 kl. 14:12 skrev Ivan Walulya : > > Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ Looks good, and I agree on trace-level being good, just one minor thing you could fix before pushing: 65 recent_gc_overhead,_g1h->capacity()); Please add a space after the comma. I can do the push iif you don?t already have a sponsor, given you get the second review. Thanks, Stefan > >> On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: >> >> Hi, >> >> On 05.03.20 11:33, Ivan Walulya wrote: >>> Hi all, >>> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >>> //Ivan >> >> some minor (imo) comments to start a discussion: >> >> - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. >> >> - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. >> >> - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. >> >> So overall, I am good with the change but asking for opinions :) >> >> Thanks, >> Thomas > From ivan.walulya at oracle.com Thu Mar 12 19:39:09 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 12 Mar 2020 20:39:09 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: <2CA8C80B-AA38-4AB7-A165-47AC7EB6AE37@oracle.com> Thanks Stefan > On 12 Mar 2020, at 18:58, Stefan Johansson wrote: > > Hi Ivan, > >> 12 mars 2020 kl. 14:12 skrev Ivan Walulya : >> >> Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ > Looks good, and I agree on trace-level being good, just one minor thing you could fix before pushing: > > 65 recent_gc_overhead,_g1h->capacity()); > Please add a space after the comma. > > I can do the push iif you don?t already have a sponsor, given you get the second review. Will make the changes and send to you for pushing after getting the second review. > > Thanks, > Stefan > > >> >>> On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: >>> >>> Hi, >>> >>> On 05.03.20 11:33, Ivan Walulya wrote: >>>> Hi all, >>>> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >>>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >>>> //Ivan >>> >>> some minor (imo) comments to start a discussion: >>> >>> - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. >>> >>> - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. >>> >>> - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. >>> >>> So overall, I am good with the change but asking for opinions :) >>> >>> Thanks, >>> Thomas >> > From adityam at microsoft.com Thu Mar 12 20:45:41 2020 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 12 Mar 2020 20:45:41 +0000 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> Message-ID: Thanks Aleksey and Thomas for reviewing. I've updated the patch with the feedback. I left the TestDynamicNumberofGCThreads test in place but fixed the comment. Seems like it's worth revisiting that test to make it more useful as a separate issue. Aleksey, I am coming from the Git world and still getting familiar with the workflow here. I hadn't heard of the MqExtension until your mail, but I tried it out. To be honest, I was quite confused about how to use it in conjunction with the webrev script even after reading some documentation. I ended up with a webrev which appears to have the right code diff, but I'm not sure if all the metadata is in the form you'd expect. I'd appreciate it if you could verify that. Updated webrev is at: https://cr.openjdk.java.net/~adityam/8231668/webrev.01/ Severin Gehwolf wrote: > For authors, 'Contributed-by:' line would not be necessary, no? They > could just use "hg commit -u ". That's my understanding > anyhow. This matches my understanding as well from reading http://openjdk.java.net/projects/#project-author. That said, I don't really have a strong preference on it, so whatever you all prefer is fine with me. Just let me know what I need to do! Thanks, Aditya From mark.reinhold at oracle.com Thu Mar 12 23:49:01 2020 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Fri, 13 Mar 2020 00:49:01 +0100 (CET) Subject: New candidate JEP: 376: ZGC: Concurrent Thread-Stack Processing Message-ID: <20200312234901.4EC2B319DB4@eggemoggin.niobe.net> https://openjdk.java.net/jeps/376 - Mark From shade at redhat.com Fri Mar 13 08:02:36 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 13 Mar 2020 09:02:36 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com>