From maoliang.ml at alibaba-inc.com Mon Mar 2 12:32:31 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 02 Mar 2020 20:32:31 +0800 Subject: =?UTF-8?B?UkZSKE0pOiA4MjM2OTI2OiBDb25jdXJyZW50bHkgdW5jb21taXQgbWVtb3J5IGluIEcx?= Message-ID: <0839e8e9-4de4-43c0-bf1b-df357b3c7771.maoliang.ml@alibaba-inc.com> Hi Thomas/Stefan and other developers, I have created the separate patch for 8236926. The concurrent work has been moved to G1YoungRemSetSamplingThread according to previous comment. Specjvm2008 worked fine with the patch(specjbb2015 doesn't have the scenario of heap shrink). http://cr.openjdk.java.net/~luchsh/8236926.webrev/ Thanks, Liang From linzang at tencent.com Mon Mar 2 13:56:52 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Mon, 2 Mar 2020 13:56:52 +0000 Subject: JDK-8215624 add parallel heap inspection support for jmap histo(G1)(Internet mail) In-Reply-To: References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com> Message-ID: <2EDF28BF-94D5-4F2E-B96E-2C45948AD454@tencent.com> Dear all, Let me try to ease the reviewing work by some explanation :P The patch's target is to speed up jmap -histo for heap iteration, from my experience it is necessary for large heap investigation. E.g in bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does take quite a while. And if my understanding is corrent, even the jmap -histo without "live" option does heap inspection with heap lock acquired. so it is very likely to block mutator thread in allocation-sensitive scenario. I would say the faster the heap inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is necessary. I think the parallel heap inspection should be applied to all kind of heap. However, consider the heap layout are different for GCs, much time is required to understand all kinds of the heap layout to make the whole change. IMO, It is not wise to have a huge patch for the whole solution at once, and it is even harder to review it. So I plan to implement it incrementally, the first patch (this one) is going to confirm the implemention detail of how jmap accept the new option, passes it to attachListener of the jvm process and then how to make the parallel inspection closure be generic enough to make it easy to extend to different heap layout. And also how to implement the heap inspection in specific gc's heap. This patch use G1's heap as the begining. This patch actually do several things: 1. Add an option "parallelThreadNum=" to jmap -histo, the default behavior is to set N to 0, means let's JVM decide how many threads to use for heap inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: https://bugs.openjdk.java.net/browse/JDK-8239290) 2. Make a change in how Jmap passing arguments, changes in http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html, originally it pass options as separate arguments to attachListener, this patch change to that all options be compose to a single string. So the arg_count_max in attachListener.hpp do not need to be changed, and hence avoid the compatibility issue, as disscussed at https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html 3. Add an abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares the data structure (KlassInfoTable) need for every parallel worker thread, and then call do_object_iterate_parallel() which is heap specific implementation. I also added some machenism in KlassInfoTable to support parallel iteration, such as merge(). 4. In specific heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel(). 5. Add related test. 6. it may be easy to extend this patch for other kinds of heap by creating subclass of ParHeapInspectTask and implement the do_object_iterate_parallel(). Hope these info could help on code review and initate the discussion :-) Thanks! BRs, Lin ?>On 2020/2/19, 9:40 AM, "linzang(??)" wrote:. > > Re-post this RFR with correct enhancement number to make it trackable. > please ignore the previous wrong post. sorry for troubles. > > webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/ > Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > -------------- > Lin > >Hi Lin, > > > >Could you, please, re-post your RFR with the right enhancement number in > >the message subject? > >It will be more trackable this way. > > > >Thanks, > >Serguei > > > > > >On 2/17/20 10:29 PM, linzang(??) wrote: > >> Dear David, > >> Thanks a lot! > >> I have updated the refined code to http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. > >> IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. > >> Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap, then we can extend the solution to other kinds of heap. > >> > >> Thanks, > >> -------------- > >> Lin > >>> Hi Lin, > >>> > >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC > >>> worker threads, and whether it needs to be extended beyond G1. > >>> > >>> I happened to spot one nit when browsing: > >>> > >>> src/hotspot/share/gc/shared/collectedHeap.hpp > >>> > >>> + virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, > >>> + BoolObjectClosure* filter, > >>> + size_t* missed_count, > >>> + size_t thread_num) { > >>> + return NULL; > >>> > >>> s/NULL/false/ > >>> > >>> Cheers, > >>> David > >>> > >>> On 18/02/2020 2:15 pm, linzang(??) wrote: > >>>> Dear All, > >>>> May I ask your help to review the follow changes: > >>>> webrev: > >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ > >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > >>>> related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > >>>> This patch enable parallel heap inspection of G1 for jmap histo. > >>>> my simple test shown it can speed up 2x of jmap -histo with > >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. > >>>> > >>>> ------------------------------------------------------------------------ > >>>> BRs, > >>>> Lin > >> > > > From maoliang.ml at alibaba-inc.com Tue Mar 3 11:14:04 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 03 Mar 2020 19:14:04 +0800 Subject: =?UTF-8?B?RzE6IEFib3J0IGNvbmN1cnJlbnQgYXQgaW5pdGlhbCBtYXJrIHBhdXNl?= Message-ID: Hi All, As previous discusion, there're several ideas to improve the humongous objects handling. We've made some experiments that canceling concurrent mark at initial mark pause is proved to be effective in the senario that frequent temporary humongous objects allocation leads to frequent concurrent mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is also the exact case but not GC sensative so there's little difference in score. The patch is small and shall we have a bug id for it? http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2020 Jan. 21 (Tue.) 18:20 To:"MAO, Liang" ; Man Cao ; hotspot-gc-dev Subject:Re: Discussion: improve humongous objects handling for G1 Hi, On 21.01.20 07:25, Liang Mao wrote: > Hi Thomas, > > In fact we saw this issue with 8u. One issue I forgot to tell is that when > CPU usage is quite high which is nearly 100% the concurrent mark will > get very slow so the to-space exhuasted happened. BTW, is there any > improvements for this point in JDK11 or higher versions? I didn't notice so far. JDK13 has some implicit increases in the thresholds to take more humongous candidate regions. Not a lot though. > Increasing reserve percent could alleviate the problem but seems not a completed > solution. It would be nicer if g1 automatically adjusted this reserve based on actual allocation of course. ;) Which is another option btw - there are many ways to avoid the evacuation failure situation. > Cancelling concurrent mark cycle in initial-mark pause seems a delicate > optimization which can cover some issues if a lot of humongous regions have been > reclaimed in this pause. It can avoid the unnecessary cm cycle and also trigger cm > earlier if neened. > We will take this into the consideration. Thanks for the great idea:) > > If there is a short-live humongous object array which also references other > short-live objects the situation could be worse. If we increase the > G1HeapRegionSize, some humongous objects become normal objects and the behavior > is more like CMS then everything goes fine. I don't think we have to not allow humongous > objects to behave as normal ones. A new allocated humongous object array can probably > reference objects in young generation and scanning the object array by remset > couldn't be better than directly iterating the array in evacuation because of possible > prefetch. We can have an alternative max survivor age for humongous object, maybe 5 or 8 If I read this paragraph correctly you argue that keeping a large humongous objArray in young is okay because a) if you increase the heap region size, it has a high chance that it would be below the thresholds anyway, so you would scan it anyway b) scanning a humongous objArray with a few references is not much different performance wise than targeted scanning of the corresponding cards in the remembered set because of hardware. Regarding a) Since I have yet to see logs, I can't tell what the typical size of these arrays are (and I have not seen a "typical" humongous object distribution graph for these applications). However regions sizes are kind of proportional with heap size which kind of corresponds to the hardware that you need to use. I.e. you likely won't see G1 using 100 threads on 200m heap with 32m regions with current ergonomics. Even then this limits objArrays to 16M (at 32m region size), which limits the time spent scanning the object (and if ergonomics select 32m regions, the heap and the machine are probably quite big anyway). From what you and Man were telling, you seem to have a significant amount of humongous objects of unknown type that are much(?) larger than that. Regarding b) that has been wrong years ago when I did experiments on that (even the "limit age on humongous obj arrays" workaround - you can easily go as low as a max tenuring threshold of 1 to catch almost all of the relevant ones), and very likely still is. Let me do some over-the-thumb calculations: Assuming that we have 32M objects (random number, i.e. ~8m references), with, say 1k references (which is more than a handful), the remembered set would make you scan only 1.5% max (1000*512 bytes/card) of the object. I seriously doubt that prefetching or some magic hardware will make that amount additional work disappear. From a performance POV, with 20 GB/s bandwidth available, (which I am not sure you will reach during GC for whatever reasons; random number), you are spending 1.5ms (if I calculated correctly) cpu time just for finding out that the 32M object is completely full of null-s in the worst case. That's also the minimum amount of time you need per such object. Keeping it outside of young gen, and particularly if it has been allocated just recently it won't have a lot remembered set entries, would likely be much cheaper than that (as mentioned, G1 has a good measure of how long scanning a card will take so we could take this number). Only if G1 is going to scan it almost completely anyway (which we agree on is unlikely to be the case as it has "just" been allocated), then keeping it outside is disadvantagous. Note that its allocation could still be counted against the eden allowance in some situations. This could be seen as a way to slow down the mutator while it is busy trying to complete the marking. I am however not sure if it helps a lot assuming that changes to perform eager reclaim on objArrays won't work during marking btw. There would be need for a different kind of enforcing such an allocation penalty. Without more thinking and measurements I would not know when and how to account that, and what has to happen with existing mechanisms to absorb allocation spikes (i.e. G1ReservePercent). I just assume that you probably do not want both. Also something to consider. > at most otherwise let eager reclam do it. A tradeoff can be made to balance the > pause time and reclamation possibility of short-live objects. > > So the enhanced solution can be > 1. Cancelling concurrent mark if not necessary. > 2. Increase the reclamation possibility of short-live humongous objects. These are valid possibilities to improve the overall situation without fixing actual fragmentation issues ;) > An important reason for this issue is that Java developers easily > challenge CMS can handle the application without significant CPU usage increase > (caused by concurrent mark) > but why G1 cannot. Personally I believe G1 can do anything not worse > than CMS:) > This proposal aims for the throughput gap comparing to CMS. If works > with the barrier optimization which is proposed by Man and Google, imho the gap could be > obviously reduced. Thanks, Thomas From per.liden at oracle.com Tue Mar 3 13:21:08 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 3 Mar 2020 14:21:08 +0100 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> Hi Kim, On 2/28/20 10:48 PM, Kim Barrett wrote: > Please review this change which removes the ConcurrentGCPhaseManager > class and replaces it with ConcurrentGCBreakpoints. > > This is joint work with Per Liden. > > This change provides a client API, used by WhiteBox. The usage model > for a client is > > (1) Acquire control of concurrent collection cycles. > > (2) Do work that must be performed while the collection cycle is in a > known state. > > (3) Request the concurrent collector run to a named "breakpoint", or > run to completion, and then hold there, waiting for further commands. > > (4) Optionally goto (2). > > (5) Release control of concurrent collection cycles. > > Tests have been updated to use the new WhiteBox API. > > This change provides implementations of the new mechanism for G1 and > ZGC. A Shenandoah implementation is being left to others, but we > don't see any obvious reason for it to be difficult. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240239 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ This looks good to me. However, it would be good if someone else had a closer look at the G1 changes, as I'm feeling less confident reviewing that part. cheers, Per > > To possibly simplify the review, the open patch is also provided as a > pair of patches, one for removing the old mechanism and a second to > add the new mechanism. > > https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ > Removes ConcurrentGCPhaseManager and its G1 implementation, except > that tests are not modifed. > > https://cr.openjdk.java.net/~kbarrett/8240239/control.03/ > Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and > updates tests to use it. > > Testing: > mach5 tier1-5, which includes all the updated and new tests. > From m.sundar85 at gmail.com Tue Mar 3 16:02:24 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 11:02:24 -0500 Subject: Need help on debugging JVM crash Message-ID: Hi, I am seeing JVM crashes on our system in GC Thread with parallel gc on x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA builds. Adding some logs lines to give some context. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 # # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # https://github.com/AdoptOpenJDK/openjdk-build/issues # Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat Enterprise Linux Server release 6.10 (Santiago) Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m 26s) Following is the stack trace ex1: Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xb0 V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 62465 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] J 474206 c2 org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] j org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 j org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 j org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 j org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] v ~StubRoutines::call_stub ex2: Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, OopClosure*)+0x2eb V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, CodeBlobClosure*, RegisterMap*, bool)+0x99 V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xb0 V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb V [libjvm.so+0xf707fd] Thread::call_run()+0x10d V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 JavaThread 0x00007f5518004000 (nid = 75659) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] J 334031 c2 com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] J 53431 c2 com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] J 63819 c2 com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] J 334032 c2 com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] J 403918 c2 com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] J 31970% c2 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] j com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] J 7487 c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 [0x00007f667dd44a60+0x0000000000000df4] J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c [0x00007f667d1f63c0+0x000000000000007c] J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] v ~StubRoutines::call_stub Not very frequent but ~90 days ~120 crashes with following signal siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 This signal is generated when we try to access non canonical address in linux. As suggested by Stefan in another thread i tried to add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and applications not surviving our production traffic(timing out and requests are failing). Questions 1. When i looked at source code for printing stack trace i see following https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 (Prints native stack trace) https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 (printing Java thread stack trace if it is involved in GC crash) a. How do you know this java thread was involved in jvm crash? b. Can i assume the java thread printed after native stack trace was the culprit? c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different stack trace in both crashes can this be the root cause? 2. Thinking of excluding compilation of ch.qos.logback.classic.spi.ThrowableProxy class and running in production to see if compilation of this method is the cause. Does it make sense? 3. Any other suggestion on debugging this further? TIA Sundar From yumin.qi at oracle.com Tue Mar 3 16:22:44 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 3 Mar 2020 08:22:44 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: HI, Sundara On 3/3/20 8:02 AM, Sundara Mohan M wrote: > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see following > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? When GC processes thread stack as root, the java thread first was recorded. This is why at crash, the java thread was printed out. > b. Can i assume the java thread printed after native stack trace was the > culprit? Please check this thread stack frames, when GC is doing marking work, I think, it encountered a bad oop. Check: If it is a compiled frame, if so, it may related to compiled code. > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? It is a C2 compiled frame. The bad oop could be a result of compiler. It also needs detail debug information to make the conclusion. Thanks Yumin > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar From m.sundar85 at gmail.com Tue Mar 3 17:39:05 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 12:39:05 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: Hi Yumin, On Tue, Mar 3, 2020 at 11:23 AM Yumin Qi wrote: > HI, Sundara > On 3/3/20 8:02 AM, Sundara Mohan M wrote: > > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)Vjava.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()Vjava.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see followinghttps://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace)https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? > > When GC processes thread stack as root, the java thread first was > recorded. This is why at crash, the java thread was printed out. > > b. Can i assume the java thread printed after native stack trace was the > culprit? > > Please check this thread stack frames, when GC is doing marking work, I > think, it encountered a bad oop. Check: > > If it is a compiled frame, if so, it may related to compiled code. > > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? > > It is a C2 compiled frame. The bad oop could be a result of compiler. > Actually the top two frame are always same in different crashes v ~RuntimeStub::_new_array_Java J 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) might be issue? Is there any way to identify that and what kind of debug flags/settings might give us this information? > It also needs detail debug information to make the conclusion. > Do you think any of the information dumped in hs_err* file might give us more info (like registers content/Instructions/core file)? Can you please let me know what additional details might help to make the conclusion? Also how to get those information? Thanks > > Yumin > > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar > > Thanks Sundar From aph at redhat.com Tue Mar 3 18:02:59 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 3 Mar 2020 18:02:59 +0000 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> Questions >> 1. When i looked at source code for printing stack trace i see followinghttps://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> (Prints native stack trace)https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> (printing Java thread stack trace if it is involved in GC crash) >> a. How do you know this java thread was involved in jvm crash? The top thread -- the first in the file -- is the one that crashed. >> When GC processes thread stack as root, the java thread first was >> recorded. This is why at crash, the java thread was printed out. >> >> b. Can i assume the java thread printed after native stack trace was the >> culprit? Certainly not. >> Please check this thread stack frames, when GC is doing marking work, I >> think, it encountered a bad oop. Check: >> >> If it is a compiled frame, if so, it may related to compiled code. >> >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different >> stack trace in both crashes can this be the root cause? >> >> It is a C2 compiled frame. The bad oop could be a result of compiler. >> > Actually the top two frame are always same in different crashes > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) > might be issue? Probably not. My money would be on a bad library using Unsafe to do something unwise. But there are many other possibilities. > Is there any way to identify that and what kind of debug flags/settings > might give us this information? > >> It also needs detail debug information to make the conclusion. >> > Do you think any of the information dumped in hs_err* file might give us > more info (like registers content/Instructions/core file)? > > Can you please let me know what additional details might help to make the > conclusion? Also how to get those information? Let's see the complete hs_err file. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From m.sundar85 at gmail.com Tue Mar 3 18:13:24 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 13:13:24 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: Waiting for moderator approval to get my hs_err* files sent. Is being held until the list moderator can review it for approval. The reason it is being held: Message body is too big: 1048807 bytes with a limit of 500 KB Thanks Sundar On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M wrote: > Hi Andrew, > Attaching hs_err* from multiple hosts where both java thread top frame > is same. > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: > >> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> >> Questions >> >> 1. When i looked at source code for printing stack trace i see >> followinghttps:// >> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >> (printing Java thread stack trace if it is involved in GC crash) >> >> a. How do you know this java thread was involved in jvm crash? >> >> The top thread -- the first in the file -- is the one that crashed. >> >> >> When GC processes thread stack as root, the java thread first was >> >> recorded. This is why at crash, the java thread was printed out. >> >> >> >> b. Can i assume the java thread printed after native stack trace was >> the >> >> culprit? >> >> Certainly not. >> >> >> Please check this thread stack frames, when GC is doing marking work, I >> >> think, it encountered a bad oop. Check: >> >> >> >> If it is a compiled frame, if so, it may related to compiled code. >> >> >> >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> different >> >> stack trace in both crashes can this be the root cause? >> >> >> >> It is a C2 compiled frame. The bad oop could be a result of compiler. >> >> >> > Actually the top two frame are always same in different crashes >> > v ~RuntimeStub::_new_array_Java >> > J 54174 c2 >> > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >> > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) >> > might be issue? >> >> Probably not. My money would be on a bad library using Unsafe to do >> something unwise. But there are many other possibilities. >> >> > Is there any way to identify that and what kind of debug flags/settings >> > might give us this information? >> > >> >> It also needs detail debug information to make the conclusion. >> >> >> > Do you think any of the information dumped in hs_err* file might give us >> > more info (like registers content/Instructions/core file)? >> > >> > Can you please let me know what additional details might help to make >> the >> > conclusion? Also how to get those information? >> >> Let's see the complete hs_err file. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >> From m.sundar85 at gmail.com Tue Mar 3 18:07:30 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 13:07:30 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: Hi Andrew, Attaching hs_err* from multiple hosts where both java thread top frame is same. Thanks Sundar On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: > On 3/3/20 5:39 PM, Sundara Mohan M wrote: > >> Questions > >> 1. When i looked at source code for printing stack trace i see > followinghttps:// > github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > >> (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > >> (printing Java thread stack trace if it is involved in GC crash) > >> a. How do you know this java thread was involved in jvm crash? > > The top thread -- the first in the file -- is the one that crashed. > > >> When GC processes thread stack as root, the java thread first was > >> recorded. This is why at crash, the java thread was printed out. > >> > >> b. Can i assume the java thread printed after native stack trace was > the > >> culprit? > > Certainly not. > > >> Please check this thread stack frames, when GC is doing marking work, I > >> think, it encountered a bad oop. Check: > >> > >> If it is a compiled frame, if so, it may related to compiled code. > >> > >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but > different > >> stack trace in both crashes can this be the root cause? > >> > >> It is a C2 compiled frame. The bad oop could be a result of compiler. > >> > > Actually the top two frame are always same in different crashes > > v ~RuntimeStub::_new_array_Java > > J 54174 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > > In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) > > might be issue? > > Probably not. My money would be on a bad library using Unsafe to do > something unwise. But there are many other possibilities. > > > Is there any way to identify that and what kind of debug flags/settings > > might give us this information? > > > >> It also needs detail debug information to make the conclusion. > >> > > Do you think any of the information dumped in hs_err* file might give us > > more info (like registers content/Instructions/core file)? > > > > Can you please let me know what additional details might help to make the > > conclusion? Also how to get those information? > > Let's see the complete hs_err file. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > From yumin.qi at oracle.com Tue Mar 3 18:49:06 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 3 Mar 2020 10:49:06 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: Hi, Sundara As suggested by Stefan in another thread i tried to > >> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and >> applications not surviving our production traffic(timing out and requests >> are failing). >> >> Questions >> 1. When i looked at source code for printing stack trace i see following >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> (printing Java thread stack trace if it is involved in GC crash) >> a. How do you know this java thread was involved in jvm crash? > When GC processes thread stack as root, the java thread first was > recorded. This is why at crash, the java thread was printed out. >> b. Can i assume the java thread printed after native stack trace was the >> culprit? > > Please check this thread stack frames, when GC is doing marking > work, I think, it encountered a bad oop. Check: > > If it is a compiled frame, if so, it may related to compiled code. > >> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different >> stack trace in both crashes can this be the root cause? > > It is a C2 compiled frame. The bad oop could be a result of compiler. > > Actually the top two frame are always same in different crashes > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > In this case do you think JVM code(frame 1) or C2 compiler code(frame > 2) might be issue? > Is there any way to identify that and what kind of debug > flags/settings might give us this information? > > It also needs detail debug information to make the conclusion. > > Do you think any of the information dumped in hs_err* file might give > us more info (like registers content/Instructions/core file)? > > Can you please let me know what additional details might help to make > the conclusion? Also how to get those information? > If it is caused by this compiled java method, excluding the java method from compilation is a workaround. You can switch to the java thread (the printed out java thread at crash), compare the failed frame in GC thread to the frame in the java thread so you will know which frame contained bad oop. Also know what is the frame, compiled, interpreter, or native. Yumin > Thanks > > Yumin > >> 2. Thinking of excluding compilation >> of ch.qos.logback.classic.spi.ThrowableProxy class and running in >> production to see if compilation of this method is the cause. Does it make >> sense? >> >> 3. Any other suggestion on debugging this further? >> >> TIA >> Sundar > > > Thanks > Sundar From stefan.karlsson at oracle.com Tue Mar 3 18:57:40 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 3 Mar 2020 19:57:40 +0100 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> Message-ID: <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> I have approved the message, but it isn't arriving. As a workaround, could you try send one hs_err file at a time, and cut the rest of the message? Each hs_err file is < 500 KB, so maybe that will work. StefanK On 2020-03-03 19:13, Sundara Mohan M wrote: > Waiting for moderator approval to get my hs_err* files sent. > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message body is too big: 1048807 bytes with a limit of 500 KB > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M wrote: > >> Hi Andrew, >> Attaching hs_err* from multiple hosts where both java thread top frame >> is same. >> >> Thanks >> Sundar >> >> On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: >> >>> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >>>>> Questions >>>>> 1. When i looked at source code for printing stack trace i see >>> followinghttps:// >>> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >>>>> (Prints native stack trace) >>> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >>>>> (printing Java thread stack trace if it is involved in GC crash) >>>>> a. How do you know this java thread was involved in jvm crash? >>> The top thread -- the first in the file -- is the one that crashed. >>> >>>>> When GC processes thread stack as root, the java thread first was >>>>> recorded. This is why at crash, the java thread was printed out. >>>>> >>>>> b. Can i assume the java thread printed after native stack trace was >>> the >>>>> culprit? >>> Certainly not. >>> >>>>> Please check this thread stack frames, when GC is doing marking work, I >>>>> think, it encountered a bad oop. Check: >>>>> >>>>> If it is a compiled frame, if so, it may related to compiled code. >>>>> >>>>> c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >>>>> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >>> different >>>>> stack trace in both crashes can this be the root cause? >>>>> >>>>> It is a C2 compiled frame. The bad oop could be a result of compiler. >>>>> >>>> Actually the top two frame are always same in different crashes >>>> v ~RuntimeStub::_new_array_Java >>>> J 54174 c2 >>>> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >>>> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >>>> In this case do you think JVM code(frame 1) or C2 compiler code(frame 2) >>>> might be issue? >>> Probably not. My money would be on a bad library using Unsafe to do >>> something unwise. But there are many other possibilities. >>> >>>> Is there any way to identify that and what kind of debug flags/settings >>>> might give us this information? >>>> >>>>> It also needs detail debug information to make the conclusion. >>>>> >>>> Do you think any of the information dumped in hs_err* file might give us >>>> more info (like registers content/Instructions/core file)? >>>> >>>> Can you please let me know what additional details might help to make >>> the >>>> conclusion? Also how to get those information? >>> Let's see the complete hs_err file. >>> >>> -- >>> Andrew Haley (he/him) >>> Java Platform Lead Engineer >>> Red Hat UK Ltd. >>> https://keybase.io/andrewhaley >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>> >>> From m.sundar85 at gmail.com Tue Mar 3 19:00:41 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:00:41 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: Another crash file. Thanks Sundar On Tue, Mar 3, 2020 at 2:00 PM Sundara Mohan M wrote: > Attaching 1 file as a work around! > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 1:57 PM Stefan Karlsson > wrote: > >> I have approved the message, but it isn't arriving. As a workaround, >> could you try send one hs_err file at a time, and cut the rest of the >> message? Each hs_err file is < 500 KB, so maybe that will work. >> >> StefanK >> >> On 2020-03-03 19:13, Sundara Mohan M wrote: >> > Waiting for moderator approval to get my hs_err* files sent. >> > >> > Is being held until the list moderator can review it for approval. >> > >> > The reason it is being held: >> > >> > Message body is too big: 1048807 bytes with a limit of 500 KB >> > >> > Thanks >> > Sundar >> > >> > On Tue, Mar 3, 2020 at 1:07 PM Sundara Mohan M >> wrote: >> > >> >> Hi Andrew, >> >> Attaching hs_err* from multiple hosts where both java thread top >> frame >> >> is same. >> >> >> >> Thanks >> >> Sundar >> >> >> >> On Tue, Mar 3, 2020 at 1:03 PM Andrew Haley wrote: >> >> >> >>> On 3/3/20 5:39 PM, Sundara Mohan M wrote: >> >>>>> Questions >> >>>>> 1. When i looked at source code for printing stack trace i see >> >>> followinghttps:// >> >>> >> github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >>>>> (Prints native stack trace) >> >>> >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >>>>> (printing Java thread stack trace if it is involved in GC crash) >> >>>>> a. How do you know this java thread was involved in jvm crash? >> >>> The top thread -- the first in the file -- is the one that crashed. >> >>> >> >>>>> When GC processes thread stack as root, the java thread first was >> >>>>> recorded. This is why at crash, the java thread was printed out. >> >>>>> >> >>>>> b. Can i assume the java thread printed after native stack trace >> was >> >>> the >> >>>>> culprit? >> >>> Certainly not. >> >>> >> >>>>> Please check this thread stack frames, when GC is doing marking >> work, I >> >>>>> think, it encountered a bad oop. Check: >> >>>>> >> >>>>> If it is a compiled frame, if so, it may related to compiled code. >> >>>>> >> >>>>> c. Since i am seeing the same frame >> (~RuntimeStub::_new_array_Java, J >> >>>>> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> >>> different >> >>>>> stack trace in both crashes can this be the root cause? >> >>>>> >> >>>>> It is a C2 compiled frame. The bad oop could be a result of >> compiler. >> >>>>> >> >>>> Actually the top two frame are always same in different crashes >> >>>> v ~RuntimeStub::_new_array_Java >> >>>> J 54174 c2 >> >>>> >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> >>>> (207 bytes) @ 0x00007f6687d92678 >> [0x00007f6687d8c700+0x0000000000005f78] >> >>>> In this case do you think JVM code(frame 1) or C2 compiler >> code(frame 2) >> >>>> might be issue? >> >>> Probably not. My money would be on a bad library using Unsafe to do >> >>> something unwise. But there are many other possibilities. >> >>> >> >>>> Is there any way to identify that and what kind of debug >> flags/settings >> >>>> might give us this information? >> >>>> >> >>>>> It also needs detail debug information to make the conclusion. >> >>>>> >> >>>> Do you think any of the information dumped in hs_err* file might >> give us >> >>>> more info (like registers content/Instructions/core file)? >> >>>> >> >>>> Can you please let me know what additional details might help to make >> >>> the >> >>>> conclusion? Also how to get those information? >> >>> Let's see the complete hs_err file. >> >>> >> >>> -- >> >>> Andrew Haley (he/him) >> >>> Java Platform Lead Engineer >> >>> Red Hat UK Ltd. >> >>> https://keybase.io/andrewhaley >> >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >>> >> >>> >> >> From m.sundar85 at gmail.com Tue Mar 3 19:02:50 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:02:50 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: Trying to send crash file1 alone. Thanks Sundar From kim.barrett at oracle.com Tue Mar 3 19:07:21 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 14:07:21 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <328e8ec2-f9cc-c083-c09e-70785064497f@oracle.com> Message-ID: > On Mar 3, 2020, at 8:21 AM, Per Liden wrote: > > Hi Kim, > > On 2/28/20 10:48 PM, Kim Barrett wrote: >> Please review this change which removes the ConcurrentGCPhaseManager >> class and replaces it with ConcurrentGCBreakpoints. >> This is joint work with Per Liden. >> This change provides a client API, used by WhiteBox. The usage model >> for a client is >> (1) Acquire control of concurrent collection cycles. >> (2) Do work that must be performed while the collection cycle is in a >> known state. >> (3) Request the concurrent collector run to a named "breakpoint", or >> run to completion, and then hold there, waiting for further commands. >> (4) Optionally goto (2). >> (5) Release control of concurrent collection cycles. >> Tests have been updated to use the new WhiteBox API. >> This change provides implementations of the new mechanism for G1 and >> ZGC. A Shenandoah implementation is being left to others, but we >> don't see any obvious reason for it to be difficult. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240239 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ > > This looks good to me. However, it would be good if someone else had a closer look at the G1 changes, as I'm feeling less confident reviewing that part. Thanks. Yeah, the G1 changes are not as nice as one might wish. I filed a couple of bugs around G1?s initiation of concurrent marking while working on this change. See https://bugs.openjdk.java.net/browse/JDK-8236031 https://bugs.openjdk.java.net/browse/JDK-8235737 I don?t think either of those block this change, but fixing them might make some parts a little easier to understand. From m.sundar85 at gmail.com Tue Mar 3 19:12:03 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:12:03 -0500 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: Sorry for spamming, can someone confirm if you received 2 crash report files? I tried sending separately but only 1 file went through other still says message body more than 500K. Thanks Sundar On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M wrote: > From ioi.lam at oracle.com Tue Mar 3 19:27:21 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 3 Mar 2020 11:27:21 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> Message-ID: <8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> For me, at least, your attachment has been filtered out by the mail server. Since this is an AdoptJDK build, I would suggest filing a bug according to the hs_err file # If you would like to submit a bug report, please visit: #?? https://github.com/AdoptOpenJDK/openjdk-build/issues and then attach your hs_err log there, and then post the URL of the bug here. Thanks - Ioi On 3/3/20 11:12 AM, Sundara Mohan M wrote: > Sorry for spamming, can someone confirm if you received 2 crash report > files? I tried sending separately but only 1 file went through other still > says message body more than 500K. > > Thanks > Sundar > > On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M wrote: > From m.sundar85 at gmail.com Tue Mar 3 19:30:57 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 14:30:57 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> References: <442a8045-8ef6-00ae-cd61-2db6f1fdb5fd@redhat.com> <7a572768-b73b-9d6e-db63-a39583f0c507@oracle.com> <8c2b3189-797f-a75e-c6de-a14f9bd7264d@oracle.com> Message-ID: Hi Ioi, Thanks for the information. I have uploaded logs here https://github.com/AdoptOpenJDK/openjdk-support/issues/69 Thanks Sundar On Tue, Mar 3, 2020 at 2:27 PM Ioi Lam wrote: > For me, at least, your attachment has been filtered out by the mail server. > > Since this is an AdoptJDK build, I would suggest filing a bug according > to the hs_err file > > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > and then attach your hs_err log there, and then post the URL of the bug > here. > > Thanks > - Ioi > > On 3/3/20 11:12 AM, Sundara Mohan M wrote: > > Sorry for spamming, can someone confirm if you received 2 crash report > > files? I tried sending separately but only 1 file went through other > still > > says message body more than 500K. > > > > Thanks > > Sundar > > > > On Tue, Mar 3, 2020 at 2:08 PM Sundara Mohan M > wrote: > > > > From ioi.lam at oracle.com Tue Mar 3 20:12:04 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 3 Mar 2020 12:12:04 -0800 Subject: Need help on debugging JVM crash In-Reply-To: References: Message-ID: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> The crash happened while the GC is running. I tried disasm of the crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from the 13.0.1+9 GA binaries of AdoptJDK) (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc ?? <>:??? push?? %rbp ?? <+1>:??? mov??? %rsp,%rbp ?? <+4>:??? push?? %r13 ?? <+6>:??? push?? %r12 ?? <+8>:??? push?? %rbx ?? <+9>:??? sub??? $0x8,%rsp ?? <+13>:??? mov??? (%rsi),%rbx? ;;; rbx = oop ?? <+16>:??? test?? %rbx,%rbx??? ;;; oop != null? ?? <+19>:??? je???? 0x7ffff67ca317 <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> ?? <+21>:??? lea??? 0x9c0afc(%rip),%rax??????? # 0x7ffff718add8 <_ZN20ParCompactionManager12_mark_bitmapE> ?? <+28>:??? mov??? %rbx,%rcx??? ;;; rcx = oop ?? <+31>:??? mov??? (%rax),%rdx? ;;; rdx = ParCompactionManager::_mark_bitmap ?? <+34>:??? sub??? (%rdx),%rcx? ;;; rcx = oop - _mark_bitmap->_region_start ?? <+37>:??? mov??? 0x10(%rdx),%rdx ;; rdx = _mark_bitmap->_beg_bits->_map ?? <+41>:??? mov??? %rcx,%rax?? ;;;? rax = oop - _mark_bitmap->_region_start ?? <+44>:??? lea??? 0x93b935(%rip),%rcx?????? # 0x7ffff7105c28 ?? <+51>:??? shr??? $0x3,%rax ?? <+55>:??? mov??? (%rcx),%ecx ?? <+57>:??? shr??? %cl,%rax ?? <+60>:??? mov??? %rax,%rcx ?? <+63>:??? mov??? %rax,%rsi????? ;;; rsi = index of oop inside mark_bitmap ?? <+66>:??? mov??? $0x1,%eax ?? <+71>:??? and??? $0x3f,%ecx ?? <+74>:??? shr??? $0x6,%rsi ?? <+78>:??? shl??? %cl,%rax ?? <+81>:??? test?? %rax,(%rdx,%rsi,8) << crash This looks like that the oop that we try to mark is actually outside of the heap range, so trying to mark it in the mark_bitmap causes this: ?? siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: ?? 0x0000000000000000 Here are the values of the registers for the "test" instruction above: ??? RAX=0x0000000000000001 is an unknown value ??? RDX=0x00007f55af000000 points into unknown readable memory: 01 00 00 00 01 00 00 04 ??? RSI=0x007fffc05491d000 is an unknown value As you can see, RSI is very large, which means you have an invalid oop in the stack that's probably very large. ??? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 ??? [libjvm.so+0xc58c8b]? OopMapSet::oops_do() ??? [libjvm.so+0x7521e9]? frame::oops_do_internal()+0x99 <<<< HERE ??? [libjvm.so+0xf55757]? JavaThread::oops_do()+0x187 As others have mentioned, this kind of error is usually caused by invalid use of Unsafe or JNI that leads to heap corruption. However, it's plausible that somehow the VM has messed up the frame and tries to mark an invalid oop. Thanks - Ioi On 3/3/20 8:02 AM, Sundara Mohan M wrote: > Hi, > I am seeing JVM crashes on our system in GC Thread with parallel gc on > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA > builds. > Adding some logs lines to give some context. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > # > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel > gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > # > # No core dump will be written. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please visit: > # https://github.com/AdoptOpenJDK/openjdk-build/issues > # > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > Enterprise Linux Server release 6.10 (Santiago) > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m > 26s) > > > Following is the stack trace > ex1: > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > const*, OopClosure*)+0x2eb > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 62465 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > J 474206 c2 > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > j > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > j > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > j > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > j > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > v ~StubRoutines::call_stub > > ex2: > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > free space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap const*, > OopClosure*)+0x2eb > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > CodeBlobClosure*, RegisterMap*, bool)+0x99 > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > CodeBlobClosure*)+0x187 > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > unsigned int)+0xb0 > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > v ~RuntimeStub::_new_array_Java > J 54174 c2 > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > J 334031 c2 > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > J 53431 c2 > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > J 63819 c2 > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > J 334032 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > J 403918 c2 > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > J 31970% c2 > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > j > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > J 7487 c1 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > [0x00007f667dd44a60+0x0000000000000df4] > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > [0x00007f667d1f63c0+0x000000000000007c] > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > v ~StubRoutines::call_stub > > Not very frequent but ~90 days ~120 crashes with following signal > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > This signal is generated when we try to access non canonical address in > linux. > > As suggested by Stefan in another thread i tried to > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > applications not surviving our production traffic(timing out and requests > are failing). > > Questions > 1. When i looked at source code for printing stack trace i see following > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > (Prints native stack trace) > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > (printing Java thread stack trace if it is involved in GC crash) > a. How do you know this java thread was involved in jvm crash? > b. Can i assume the java thread printed after native stack trace was the > culprit? > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but different > stack trace in both crashes can this be the root cause? > > 2. Thinking of excluding compilation > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > production to see if compilation of this method is the cause. Does it make > sense? > > 3. Any other suggestion on debugging this further? > > TIA > Sundar From m.sundar85 at gmail.com Wed Mar 4 01:01:28 2020 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 3 Mar 2020 20:01:28 -0500 Subject: Need help on debugging JVM crash In-Reply-To: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> References: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> Message-ID: Hi Ioi, Thanks for the analysis. On Tue, Mar 3, 2020 at 3:12 PM Ioi Lam wrote: > The crash happened while the GC is running. I tried disasm of the > crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from the > 13.0.1+9 GA binaries of AdoptJDK) > > (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc > <>: push %rbp > <+1>: mov %rsp,%rbp > <+4>: push %r13 > <+6>: push %r12 > <+8>: push %rbx > <+9>: sub $0x8,%rsp > <+13>: mov (%rsi),%rbx ;;; rbx = oop > <+16>: test %rbx,%rbx ;;; oop != null? > <+19>: je 0x7ffff67ca317 > <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> > <+21>: lea 0x9c0afc(%rip),%rax # 0x7ffff718add8 > <_ZN20ParCompactionManager12_mark_bitmapE> > <+28>: mov %rbx,%rcx ;;; rcx = oop > <+31>: mov (%rax),%rdx ;;; rdx = > ParCompactionManager::_mark_bitmap > <+34>: sub (%rdx),%rcx ;;; rcx = oop - > _mark_bitmap->_region_start > <+37>: mov 0x10(%rdx),%rdx ;; rdx = _mark_bitmap->_beg_bits->_map > <+41>: mov %rcx,%rax ;;; rax = oop - > _mark_bitmap->_region_start > <+44>: lea 0x93b935(%rip),%rcx # 0x7ffff7105c28 > > <+51>: shr $0x3,%rax > <+55>: mov (%rcx),%ecx > <+57>: shr %cl,%rax > <+60>: mov %rax,%rcx > <+63>: mov %rax,%rsi ;;; rsi = index of oop inside > mark_bitmap > <+66>: mov $0x1,%eax > <+71>: and $0x3f,%ecx > <+74>: shr $0x6,%rsi > <+78>: shl %cl,%rax > <+81>: test %rax,(%rdx,%rsi,8) << crash > > > This looks like that the oop that we try to mark is actually outside of > the heap range, so trying to mark it in the mark_bitmap causes this: > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > 0x0000000000000000 > > > Here are the values of the registers for the "test" instruction above: > > > RAX=0x0000000000000001 is an unknown value > RDX=0x00007f55af000000 points into unknown readable memory: 01 00 > 00 00 01 00 00 04 > RSI=0x007fffc05491d000 is an unknown value > > > As you can see, RSI is very large, which means you have an invalid oop > in the stack that's probably very large. > Can you please explain "stack" means here? Is it functions stack variable or some thing which GC internally uses? > > > [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > [libjvm.so+0xc58c8b] OopMapSet::oops_do() > [libjvm.so+0x7521e9] frame::oops_do_internal()+0x99 <<<< HERE > [libjvm.so+0xf55757] JavaThread::oops_do()+0x187 > > > As others have mentioned, this kind of error is usually caused by > invalid use of Unsafe or JNI that leads to heap corruption. However, > it's plausible that somehow the VM has messed up the frame and tries to > mark an invalid oop. > Was trying to avoid using JNI calls to check if that is the cause but that seems not an option for now. Do you think any other way to get the root cause for this? > Thanks > - Ioi > > > On 3/3/20 8:02 AM, Sundara Mohan M wrote: > > Hi, > > I am seeing JVM crashes on our system in GC Thread with parallel gc > on > > x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 > GA > > builds. > > Adding some logs lines to give some context. > > > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 > > # > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, > parallel > > gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > > # > > # No core dump will be written. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again > > # > > # If you would like to submit a bug report, please visit: > > # https://github.com/AdoptOpenJDK/openjdk-build/issues > > # > > > > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat > > Enterprise Linux Server release 6.10 (Santiago) > > Time: Thu Feb 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h > 10m > > 26s) > > > > > > Following is the stack trace > > ex1: > > Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > *V [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* > > V [libjvm.so+0xc58c8b] OopMapSet::oops_do(frame const*, RegisterMap > > const*, OopClosure*)+0x2eb > > V [libjvm.so+0x7521e9] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf55757] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7e0f8b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf5d43d] Thread::call_run()+0x10d > > V [libjvm.so+0xc74337] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 62465 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] > > J 474206 c2 > > > org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > > (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] > > j > > > org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 > > j > > > org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 > > j > > > org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 > > j > > > org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 > > J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ > > 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] > > J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 > > bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] > > J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ > > 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] > > J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V > > (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] > > J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ > > 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] > > v ~StubRoutines::call_stub > > > > ex2: > > Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, > > free space=1014k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > > > *V [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V > > [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*, RegisterMap > const*, > > OopClosure*)+0x2eb > > V [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*, > > CodeBlobClosure*, RegisterMap*, bool)+0x99 > > V [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*, > > CodeBlobClosure*)+0x187 > > V [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, > > unsigned int)+0xb0 > > V [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb > > V [libjvm.so+0xf707fd] Thread::call_run()+0x10d > > V [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7 > > > > JavaThread 0x00007f5518004000 (nid = 75659) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > v ~RuntimeStub::_new_array_Java > > J 54174 c2 > > ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V > > (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] > > J 334031 c2 > > com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 > > bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] > > J 53431 c2 > > > com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] > > J 63819 c2 > > > com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; > > (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] > > J 334032 c2 > > > com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; > > (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] > > J 403918 c2 > > com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 > > bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] > > J 17530 c2 > com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z > > (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] > > J 31970% c2 > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; > > (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] > > j > > > com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 > > J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 > > bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] > > J 7487 c1 > > > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > > java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 > > [0x00007f667dd44a60+0x0000000000000df4] > > J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V > > java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c > > [0x00007f667d1f63c0+0x000000000000007c] > > J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ > > 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] > > v ~StubRoutines::call_stub > > > > Not very frequent but ~90 days ~120 crashes with following signal > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > > 0x0000000000000000 > > This signal is generated when we try to access non canonical address in > > linux. > > > > As suggested by Stefan in another thread i tried to > > add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and > > applications not surviving our production traffic(timing out and requests > > are failing). > > > > Questions > > 1. When i looked at source code for printing stack trace i see following > > > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 > > (Prints native stack trace) > > > https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 > > (printing Java thread stack trace if it is involved in GC crash) > > a. How do you know this java thread was involved in jvm crash? > > b. Can i assume the java thread printed after native stack trace was > the > > culprit? > > c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J > > 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but > different > > stack trace in both crashes can this be the root cause? > > > > 2. Thinking of excluding compilation > > of ch.qos.logback.classic.spi.ThrowableProxy class and running in > > production to see if compilation of this method is the cause. Does it > make > > sense? > > > > 3. Any other suggestion on debugging this further? > > > > TIA > > Sundar > > Thanks Sundar From kim.barrett at oracle.com Wed Mar 4 02:16:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:16:38 -0500 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement Message-ID: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Please review this change to the handling of "padding" for the threshold used to decide whether a mutator thread should perform concurrent refinement. Rather than doing a slightly tricky (because of potential overflow) computation every time a mutator thread completes a buffer, instead perform that computation once and record the result for repeated use. CR: https://bugs.openjdk.java.net/browse/JDK-8239825 Webrev: https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ Testing: mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. Local (linux-x64) hotspot:tier1 with just this change. From kim.barrett at oracle.com Wed Mar 4 02:17:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:17:46 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush Message-ID: Please review this trivial change to remove the useless call to flush() from the G1DirtyCardQueue destructor. See the CR for more details. This removes the need for a non-trivial destructor for that class. CR: https://bugs.openjdk.java.net/browse/JDK-8240133 Webrev: https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ Testing: mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. From kim.barrett at oracle.com Wed Mar 4 02:32:06 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Mar 2020 21:32:06 -0500 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer Message-ID: Please review this change to the handling of completed buffers by mutator threads. Previously it would conditionally process and potentially reuse the buffer, rather than enqueuing it. Now, always enqueue the buffer and allocate a new one, and conditionally process the next (oldest) dirty buffer in the DCQS. The benefit of this is that the buffers being processed by the mutator age for a while in the DCQS (just as is done by for concurrent refinement thread processing), so if the mutator is making repeated writes to the same or nearby locations, the associated card marking has more opportunaty to be filtered out. CR: https://bugs.openjdk.java.net/browse/JDK-8139652 Webrev: https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ Testing mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. From stefan.karlsson at oracle.com Wed Mar 4 08:37:03 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 4 Mar 2020 09:37:03 +0100 Subject: Need help on debugging JVM crash In-Reply-To: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> References: <86c8b74f-9f11-b9b5-00f2-30360d0e8b6f@oracle.com> Message-ID: <6420763d-43e8-9ee8-0041-c06578911644@oracle.com> FWIW, I see that this is run with -XX:-OmitStackTraceInFastThrowFalse. Maybe there's a problem with that flag? Some more info from the hs_err file that could further clues to the problem: The Java thread the GC is scanning is creating a ThrowableProxy, and is in the process of taking a slow path to allocate an array. Looking at the code it seems like it first calls Thread.getStackTrace(), and then creates an array of proxies to those elements. One of the hs_err files report over > 800 OutOfMemoryErrors. StefanK On 2020-03-03 21:12, Ioi Lam wrote: > The crash happened while the GC is running. I tried disasm of the > crashing address PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 (from > the 13.0.1+9 GA binaries of AdoptJDK) > > (gdb) x/80i _ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc > ?? <>:??? push?? %rbp > ?? <+1>:??? mov??? %rsp,%rbp > ?? <+4>:??? push?? %r13 > ?? <+6>:??? push?? %r12 > ?? <+8>:??? push?? %rbx > ?? <+9>:??? sub??? $0x8,%rsp > ?? <+13>:??? mov??? (%rsi),%rbx? ;;; rbx = oop > ?? <+16>:??? test?? %rbx,%rbx??? ;;; oop != null? > ?? <+19>:??? je???? 0x7ffff67ca317 > <_ZN20PCMarkAndPushClosure6do_oopEPP7oopDesc+87> > ?? <+21>:??? lea??? 0x9c0afc(%rip),%rax??????? # 0x7ffff718add8 > <_ZN20ParCompactionManager12_mark_bitmapE> > ?? <+28>:??? mov??? %rbx,%rcx??? ;;; rcx = oop > ?? <+31>:??? mov??? (%rax),%rdx? ;;; rdx = > ParCompactionManager::_mark_bitmap > ?? <+34>:??? sub??? (%rdx),%rcx? ;;; rcx = oop - > _mark_bitmap->_region_start > ?? <+37>:??? mov??? 0x10(%rdx),%rdx ;; rdx = > _mark_bitmap->_beg_bits->_map > ?? <+41>:??? mov??? %rcx,%rax?? ;;;? rax = oop - > _mark_bitmap->_region_start > ?? <+44>:??? lea??? 0x93b935(%rip),%rcx?????? # 0x7ffff7105c28 > > ?? <+51>:??? shr??? $0x3,%rax > ?? <+55>:??? mov??? (%rcx),%ecx > ?? <+57>:??? shr??? %cl,%rax > ?? <+60>:??? mov??? %rax,%rcx > ?? <+63>:??? mov??? %rax,%rsi????? ;;; rsi = index of oop inside > mark_bitmap > ?? <+66>:??? mov??? $0x1,%eax > ?? <+71>:??? and??? $0x3f,%ecx > ?? <+74>:??? shr??? $0x6,%rsi > ?? <+78>:??? shl??? %cl,%rax > ?? <+81>:??? test?? %rax,(%rdx,%rsi,8) << crash > > > This looks like that the oop that we try to mark is actually outside > of the heap range, so trying to mark it in the mark_bitmap causes this: > > > ?? siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: > ?? 0x0000000000000000 > > > Here are the values of the registers for the "test" instruction above: > > > ??? RAX=0x0000000000000001 is an unknown value > ??? RDX=0x00007f55af000000 points into unknown readable memory: 01 00 > 00 00 01 00 00 04 > ??? RSI=0x007fffc05491d000 is an unknown value > > > As you can see, RSI is very large, which means you have an invalid oop > in the stack that's probably very large. > > > ??? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 > ??? [libjvm.so+0xc58c8b]? OopMapSet::oops_do() > ??? [libjvm.so+0x7521e9]? frame::oops_do_internal()+0x99 <<<< HERE > ??? [libjvm.so+0xf55757]? JavaThread::oops_do()+0x187 > > > As others have mentioned, this kind of error is usually caused by > invalid use of Unsafe or JNI that leads to heap corruption. However, > it's plausible that somehow the VM has messed up the frame and tries > to mark an invalid oop. > > Thanks > - Ioi > > > On 3/3/20 8:02 AM, Sundara Mohan M wrote: >> Hi, >> ???? I am seeing JVM crashes on our system in GC Thread with parallel >> gc on >> x86 linux. Observed the same crash happening on >> JVM-11.0.6/13.0.2/13.0.1 GA >> builds. >> Adding some logs lines to give some context. >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> #? SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106 >> # >> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9) >> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, >> parallel >> gc, linux-amd64) >> # Problematic frame: >> # V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51 >> # >> # No core dump will be written. Core dumps have been disabled. To enable >> core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # If you would like to submit a bug report, please visit: >> #?? https://github.com/AdoptOpenJDK/openjdk-build/issues >> # >> >> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat >> Enterprise Linux Server release 6.10 (Santiago) >> Time: Thu Feb? 6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d >> 7h 10m >> 26s) >> >> >> Following is the stack trace >> ex1: >> Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000], sp=0x00007fd01ccd8890, >> ? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, >> j=interpreted, Vv=VM code, C=native code) >> *V? [libjvm.so+0xcc0121] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51* >> V? [libjvm.so+0xc58c8b]? OopMapSet::oops_do(frame const*, RegisterMap >> const*, OopClosure*)+0x2eb >> V? [libjvm.so+0x7521e9]? frame::oops_do_internal(OopClosure*, >> CodeBlobClosure*, RegisterMap*, bool)+0x99 >> V? [libjvm.so+0xf55757]? JavaThread::oops_do(OopClosure*, >> CodeBlobClosure*)+0x187 >> V? [libjvm.so+0xcbb100] ThreadRootsMarkingTask::do_it(GCTaskManager*, >> unsigned int)+0xb0 >> V? [libjvm.so+0x7e0f8b]? GCTaskThread::run()+0x1eb >> V? [libjvm.so+0xf5d43d]? Thread::call_run()+0x10d >> V? [libjvm.so+0xc74337]? thread_native_entry(Thread*)+0xe7 >> >> JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> v? ~RuntimeStub::_new_array_Java >> J 62465 c2 >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4] >> J 474206 c2 >> org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V >> >> (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c] >> j >> org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7 >> j >> org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6 >> j >> org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181 >> j >> org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13 >> J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @ >> 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14] >> J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334 >> bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0] >> J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @ >> 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124] >> J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V >> (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174] >> J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @ >> 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038] >> v? ~StubRoutines::call_stub >> >> ex2: >> Stack: [0x00007f669869f000,0x00007f669879f000], sp=0x00007f669879c890, >> ? free space=1014k >> Native frames: (J=compiled Java code, A=aot compiled Java code, >> j=interpreted, Vv=VM code, C=native code) >> >> *V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V >> ? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*, RegisterMap >> const*, >> OopClosure*)+0x2eb >> V? [libjvm.so+0x765489]? frame::oops_do_internal(OopClosure*, >> CodeBlobClosure*, RegisterMap*, bool)+0x99 >> V? [libjvm.so+0xf68b17]? JavaThread::oops_do(OopClosure*, >> CodeBlobClosure*)+0x187 >> V? [libjvm.so+0xcce2f0] ThreadRootsMarkingTask::do_it(GCTaskManager*, >> unsigned int)+0xb0 >> V? [libjvm.so+0x7f422b]? GCTaskThread::run()+0x1eb >> V? [libjvm.so+0xf707fd]? Thread::call_run()+0x10d >> V? [libjvm.so+0xc875b7]? thread_native_entry(Thread*)+0xe7 >> >> JavaThread 0x00007f5518004000 (nid = 75659) was being processed >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> v? ~RuntimeStub::_new_array_Java >> J 54174 c2 >> ch.qos.logback.classic.spi.ThrowableProxy.(Ljava/lang/Throwable;)V >> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78] >> J 334031 c2 >> com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004 >> bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0] >> J 53431 c2 >> com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; >> >> (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250] >> J 63819 c2 >> com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; >> >> (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc] >> J 334032 c2 >> com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; >> >> (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4] >> J 403918 c2 >> com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272 >> bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c] >> J 17530 c2 >> com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z >> (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068] >> J 31970% c2 >> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; >> >> (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0] >> j >> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1 >> J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 >> bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684] >> J 7487 c1 >> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V >> >> java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854 >> [0x00007f667dd44a60+0x0000000000000df4] >> J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V >> java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c >> [0x00007f667d1f63c0+0x000000000000007c] >> J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ >> 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134] >> v? ~StubRoutines::call_stub >> >> Not very frequent but ~90 days ~120 crashes with following signal >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: >> 0x0000000000000000 >> This signal is generated when we try to access non canonical address in >> linux. >> >> As suggested by Stefan in another thread i tried to >> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency >> and >> applications not surviving our production traffic(timing out and >> requests >> are failing). >> >> Questions >> 1. When i looked at source code for printing stack trace i see following >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696 >> >> (Prints native stack trace) >> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718 >> >> (printing Java thread stack trace if it is involved in GC crash) >> ?? a. How do you know this java thread was involved in jvm crash? >> ?? b. Can i assume the java thread printed after native stack trace >> was the >> culprit? >> ?? c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J >> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy...) but >> different >> stack trace in both crashes can this be the root cause? >> >> 2. Thinking of excluding compilation >> of ch.qos.logback.classic.spi.ThrowableProxy class and running in >> production to see if compilation of this method is the cause. Does it >> make >> sense? >> >> 3. Any other suggestion on debugging this further? >> >> TIA >> Sundar > From shade at redhat.com Wed Mar 4 08:39:22 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 09:39:22 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240511 See bug for rationale. The best fix seems to be just using ParallelGCThreads and ditching Shenandoah-specific option altogether: https://cr.openjdk.java.net/~shade/8240511/webrev.01/ Testing: hotspot_gc_shenandoah, eyeballing gross pause times -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 10:41:17 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 11:41:17 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: References: Message-ID: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> Ok yes, that makes sense. That flag predates upstream integration of that code, and it wasn't quite clear how many threads are useful for safepoint cleanup. IIRC, I found that hammering it with ParallelGCThreads was overkill - on my machine. But you are right, hard-wiring it to 4 is certainly overkill on smaller machines than mine ;-) Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240511 > > See bug for rationale. The best fix seems to be just using ParallelGCThreads and ditching > Shenandoah-specific option altogether: > https://cr.openjdk.java.net/~shade/8240511/webrev.01/ > > Testing: hotspot_gc_shenandoah, eyeballing gross pause times > From shade at redhat.com Wed Mar 4 10:48:27 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 11:48:27 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> References: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> Message-ID: <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> On 3/4/20 11:41 AM, Roman Kennke wrote: > Ok yes, that makes sense. > > That flag predates upstream integration of that code, and it wasn't > quite clear how many threads are useful for safepoint cleanup. IIRC, I > found that hammering it with ParallelGCThreads was overkill - on my > machine. But you are right, hard-wiring it to 4 is certainly overkill on > smaller machines than mine ;-) I ran a few latency-sensitive tests on my smaller desktop, and they did not regress. I believe that is partly because we have trimmed down the number of parallel threads with JDK-8225229. Therefore I see no reason to keep it in. Another unnecessary GC option bites the dust. -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 10:51:54 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 11:51:54 +0100 Subject: RFR (XS) 8240511: Shenandoah: parallel safepoint workers count should be ParallelGCThreads In-Reply-To: <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> References: <0ad816d6-6537-6e61-9b17-fabc1631e1fc@redhat.com> <71e4d1c7-b454-fcae-0c8e-73c90e7cda06@redhat.com> Message-ID: >> Ok yes, that makes sense. >> >> That flag predates upstream integration of that code, and it wasn't >> quite clear how many threads are useful for safepoint cleanup. IIRC, I >> found that hammering it with ParallelGCThreads was overkill - on my >> machine. But you are right, hard-wiring it to 4 is certainly overkill on >> smaller machines than mine ;-) > I ran a few latency-sensitive tests on my smaller desktop, and they did not regress. I believe that > is partly because we have trimmed down the number of parallel threads with JDK-8225229. Therefore I > see no reason to keep it in. Another unnecessary GC option bites the dust. Sure, go! Roman From shade at redhat.com Wed Mar 4 17:14:57 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 4 Mar 2020 18:14:57 +0100 Subject: RFR (XS) 8240534: Shenandoah: ditch debug safepoint timeout adjustment block Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240534 This seems to be causing some of the failures on our new test servers: diff -r 6f709455592a src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 11:50:28 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 18:12:25 2020 +0100 @@ -192,14 +192,4 @@ FLAG_SET_DEFAULT(TLABAllocationWeight, 90); } - - // Make sure safepoint deadlocks are failing predictably. This sets up VM to report - // fatal error after 10 seconds of wait for safepoint syncronization (not the VM - // operation itself). There is no good reason why Shenandoah would spend that - // much time synchronizing. -#ifdef ASSERT - FLAG_SET_DEFAULT(SafepointTimeout, true); - FLAG_SET_DEFAULT(SafepointTimeoutDelay, 10000); - FLAG_SET_DEFAULT(AbortVMOnSafepointTimeout, true); -#endif } -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 4 17:30:55 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 4 Mar 2020 18:30:55 +0100 Subject: RFR (XS) 8240534: Shenandoah: ditch debug safepoint timeout adjustment block In-Reply-To: References: Message-ID: <1516299c-e0a6-6b3f-0928-49b08bac3a67@redhat.com> Ok. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240534 > > This seems to be causing some of the failures on our new test servers: > > diff -r 6f709455592a src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 11:50:28 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp Wed Mar 04 18:12:25 2020 +0100 > @@ -192,14 +192,4 @@ > FLAG_SET_DEFAULT(TLABAllocationWeight, 90); > } > - > - // Make sure safepoint deadlocks are failing predictably. This sets up VM to report > - // fatal error after 10 seconds of wait for safepoint syncronization (not the VM > - // operation itself). There is no good reason why Shenandoah would spend that > - // much time synchronizing. > -#ifdef ASSERT > - FLAG_SET_DEFAULT(SafepointTimeout, true); > - FLAG_SET_DEFAULT(SafepointTimeoutDelay, 10000); > - FLAG_SET_DEFAULT(AbortVMOnSafepointTimeout, true); > -#endif > } > > From zgu at redhat.com Wed Mar 4 23:06:14 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 4 Mar 2020 18:06:14 -0500 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> Message-ID: Traversal GC has the same issue, also need to remark on stack code roots in final traversal. @@ -263,11 +263,12 @@ if (!_heap->is_degenerated_gc_in_progress()) { ShenandoahTraversalRootsClosure roots_cl(q, rp); ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); if (unload_classes) { ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); - _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, NULL, &tc); + MarkingCodeBlobClosure code_cl(&roots_cl, CodeBlobToOopClosure::FixRelocations); + _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, &code_cl, &tc); } else { CLDToOopClosure cld_cl(&roots_cl, ClassLoaderData::_claim_strong); _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); } } else { Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ Thank, -Zhengyu On 2/25/20 12:13 PM, Zhengyu Gu wrote: > Shenandoah encounters a few test failures with tools/javac. Verifier > catches unmarked oops in nmethod's metadata during root evacuation in > final mark phase. > > The problem is that, Shenandoah marks on stack nmethods in init mark > pause, but it does not mark nmethod's metadata during concurrent mark > phase, when new nmethod is about to be executed. > > The solution: > 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the > nmethod is about to be executed, when nmethod entry barrier is supported. > > 2) Remark on stack nmethod's metadata at final mark pause. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and > release) > > Thanks, > > -Zhengyu From manc at google.com Thu Mar 5 01:32:10 2020 From: manc at google.com (Man Cao) Date: Wed, 4 Mar 2020 17:32:10 -0800 Subject: G1: Abort concurrent at initial mark pause In-Reply-To: References: Message-ID: Hi Liang, Thanks for the quick contribution! This would solve a big problem for us. I have created https://bugs.openjdk.java.net/browse/JDK-8240556. You could start a thread with title "RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects". -Man From maoliang.ml at alibaba-inc.com Thu Mar 5 06:40:52 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 05 Mar 2020 14:40:52 +0800 Subject: =?UTF-8?B?UmU6IEcxOiBBYm9ydCBjb25jdXJyZW50IGF0IGluaXRpYWwgbWFyayBwYXVzZQ==?= In-Reply-To: References: , Message-ID: <9e9533b4-bb6a-4631-97e3-1e254092aa6e.maoliang.ml@alibaba-inc.com> Hi Man, Thanks for creating the bug id! Thanks, Liang ------------------------------------------------------------------ From:Man Cao Send Time:2020 Mar. 5 (Thu.) 09:32 To:hotspot-gc-dev Cc:"MAO, Liang" ; Thomas Schatzl Subject:Re: G1: Abort concurrent at initial mark pause Hi Liang, Thanks for the quick contribution! This would solve a big problem for us. I have created https://bugs.openjdk.java.net/browse/JDK-8240556. You could start a thread with title "RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects". -Man From maoliang.ml at alibaba-inc.com Thu Mar 5 07:13:38 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 05 Mar 2020 15:13:38 +0800 Subject: =?UTF-8?B?UkZSIChTKTogODI0MDU1NjogQWJvcnQgY29uY3VycmVudCBtYXJrIGFmdGVyIGVmZmVjdGl2?= =?UTF-8?B?ZSBlYWdlciByZWNsYW1hdGlvbiBvZiBodW1vbmdvdXMgb2JqZWN0cw==?= Message-ID: Hi All, Now we have the bug id. I did more test to the patch. There's a little concern in the patch that when we decide to cancle the concurrent cycle in initial mark pause we need to clear the next bitmap which supposes to be cleared concurrently. In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, the time spent on clearing next bitmap was consistently less than 10ms. So I guess it could be acceptable. Bug: https://bugs.openjdk.java.net/browse/JDK-8240556 Webrev: http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang ------------------------------------------------------------------ From:MAO, Liang Send Time:2020 Mar. 3 (Tue.) 19:14 To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev Subject:G1: Abort concurrent at initial mark pause Hi All, As previous discusion, there're several ideas to improve the humongous objects handling. We've made some experiments that canceling concurrent mark at initial mark pause is proved to be effective in the senario that frequent temporary humongous objects allocation leads to frequent concurrent mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is also the exact case but not GC sensative so there's little difference in score. The patch is small and shall we have a bug id for it? http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ Thanks, Liang From thomas.schatzl at oracle.com Thu Mar 5 09:50:03 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 10:50:03 +0100 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: Hi Kim, On 04.03.20 03:17, Kim Barrett wrote: > Please review this trivial change to remove the useless call to flush() from > the G1DirtyCardQueue destructor. See the CR for more details. This removes > the need for a non-trivial destructor for that class. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ > > Testing: > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > looks good to me. Thomas From ivan.walulya at oracle.com Thu Mar 5 10:32:17 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:32:17 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening Message-ID: Hi all, Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ //Ivan From ivan.walulya at oracle.com Thu Mar 5 10:33:47 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:33:47 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity Message-ID: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> Hi all, Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ //Ivan From ivan.walulya at oracle.com Thu Mar 5 10:37:28 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 5 Mar 2020 11:37:28 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before Message-ID: Hi all, Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ //Ivan From thomas.schatzl at oracle.com Thu Mar 5 11:11:48 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 12:11:48 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: References: Message-ID: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Hi, On 05.03.20 11:32, Ivan Walulya wrote: > Hi all, > > Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ > > > //Ivan > looks good. Please backport to 14u too. Thanks, Thomas From ralf.schmelter at sap.com Thu Mar 5 13:29:33 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Thu, 5 Mar 2020 13:29:33 +0000 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC Message-ID: Hi, could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ Best regards, Ralf From thomas.schatzl at oracle.com Thu Mar 5 14:40:20 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 15:40:20 +0100 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References: Message-ID: Hi Ralf, On 05.03.20 14:29, Schmelter, Ralf wrote: > Hi, > > could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ > looks good to me. Let me run it through testing. Thanks, Thomas From thomas.schatzl at oracle.com Thu Mar 5 15:13:58 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 5 Mar 2020 16:13:58 +0100 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: Hi, On 04.03.20 03:16, Kim Barrett wrote: > Please review this change to the handling of "padding" for the threshold > used to decide whether a mutator thread should perform concurrent > refinement. Rather than doing a slightly tricky (because of potential > overflow) computation every time a mutator thread completes a buffer, > instead perform that computation once and record the result for repeated > use. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239825 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ > > Testing: > mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with just this change. > I think this is good. Thomas From manc at google.com Thu Mar 5 19:24:13 2020 From: manc at google.com (Man Cao) Date: Thu, 5 Mar 2020 11:24:13 -0800 Subject: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects In-Reply-To: References: Message-ID: Hi Liang, Overall, I think the approach would work well after fixing a few issues below. In g1CollectedHeap.cpp: 3111 if (gc_cause() == GCCause::_g1_humongous_allocation && > collector_state()->in_initial_mark_gc()) { > 3112 // Check if we still need to do concurrent mark after > evacuation > 3113 // Abort concurrent mark in case we cleaned humongous > objects via eager reclaim > 3114 should_start_conc_mark = > policy()->need_to_start_conc_mark("end of GC"); Two issues: (1) I think need_to_start_conc_mark() does not have the most up-to-date information at this point. For example, the later expand_heap_after_young_collection() could update G1IHOPControl::_target_occupancy, which is used by need_to_start_conc_mark(). One possible solution could be to move the " if (should_start_conc_mark) { concurrent_mark()->post_initial_mark(); } " below to after expand_heap_after_young_collection(). I'd wait for the G1 team members to confirm that this approach is safe. (2) Does it need to call collector_state()->set_in_initial_mark_gc(false) if need_to_start_conc_mark() returns false? Specifically, the later G1Policy::record_collection_pause_end() would call collector_state()->set_mark_or_rebuild_in_progress(true), if collector_state()->in_initial_mark_gc() remains true. This is probably wrong if the initial mark has been aborted. 2059 void G1CollectedHeap::decrement_old_marking_cycles_started() { > 2060 assert(_old_marking_cycles_started > 0, "must be"); Could it assert "_old_marking_cycles_started == _old_marking_cycles_completed + 1" instead? 3125 } else if (collector_state()->in_initial_mark_gc()) { > 3126 // Don't do concurrent mark any more > 3127 concurrent_mark()->initial_mark_abort(); > 3128 log_info(gc)("Concurrent Aborted"); It's probably better to move the log_info inside the initial_mark_abort() method. Also, "Concurrent Start Cancelled" is probably a more precise and unambiguous message. It corresponds to the "Pause Young (Concurrent Start)" in G1CollectedHeap::young_gc_name(), and does not collide with "Concurrent Mark Abort" in G1ConcurrentMark::concurrent_cycle_end(). Perhaps initial_mark_abort() could be renamed to cancel_initial_mark() also? -Man On Wed, Mar 4, 2020 at 11:13 PM Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; > hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent > concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > From stefan.johansson at oracle.com Thu Mar 5 20:15:10 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 5 Mar 2020 21:15:10 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Message-ID: <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> > 5 mars 2020 kl. 12:11 skrev Thomas Schatzl : > > Hi, > > On 05.03.20 11:32, Ivan Walulya wrote: >> Hi all, >> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. > > Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >> //Ivan > > looks good. Please backport to 14u too. Looks good to me too, Stefan > > Thanks, > Thomas From kim.barrett at oracle.com Fri Mar 6 00:50:36 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:50:36 -0500 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References: Message-ID: <54DB78FD-E761-41A1-86C2-15DE18CADABC@oracle.com> > On Mar 5, 2020, at 9:40 AM, Thomas Schatzl wrote: > > Hi Ralf, > > On 05.03.20 14:29, Schmelter, Ralf wrote: >> Hi, >> could you review the small change. It implements get_safepoint_workers() for the ParallelScavengeHeap, so that the worker threads could be used for other tasks. This is already implemented for G1, Z and Shenandoah. Since the parallel GC does used the worker threads only in the collection VM operation it can safely share them. >> bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 >> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ > > looks good to me. Let me run it through testing. > > Thanks, > Thomas Looks good to me too. From kim.barrett at oracle.com Fri Mar 6 00:51:14 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:51:14 -0500 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: > On Mar 5, 2020, at 10:13 AM, Thomas Schatzl wrote: > > Hi, > > On 04.03.20 03:16, Kim Barrett wrote: >> Please review this change to the handling of "padding" for the threshold >> used to decide whether a mutator thread should perform concurrent >> refinement. Rather than doing a slightly tricky (because of potential >> overflow) computation every time a mutator thread completes a buffer, >> instead perform that computation once and record the result for repeated >> use. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8239825 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ >> Testing: >> mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. >> Local (linux-x64) hotspot:tier1 with just this change. > > I think this is good. > > Thomas Thanks. From kim.barrett at oracle.com Fri Mar 6 00:51:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 19:51:46 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: <7FFE8194-B5E0-489C-9F39-279C8FC081D0@oracle.com> > On Mar 5, 2020, at 4:50 AM, Thomas Schatzl wrote: > > Hi Kim, > > On 04.03.20 03:17, Kim Barrett wrote: >> Please review this trivial change to remove the useless call to flush() from >> the G1DirtyCardQueue destructor. See the CR for more details. This removes >> the need for a non-trivial destructor for that class. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ >> Testing: >> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. >> Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > > looks good to me. > > Thomas Thanks. From kim.barrett at oracle.com Fri Mar 6 01:38:52 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Mar 2020 20:38:52 -0500 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References: Message-ID: > On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: > > Hi all, > > Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ I think I'd prefer the old ordering, but capture num_free_regions() into a variable before the abandon, and use that variable in the logging. But there's also the question of why the log message mentions the number of free regions at all, since the number of pre-existing free regions isn't important because of the abandonment. From sangheon.kim at oracle.com Fri Mar 6 07:40:09 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 5 Mar 2020 23:40:09 -0800 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: Hi Kim, On 2/28/20 1:48 PM, Kim Barrett wrote: > Please review this change which removes the ConcurrentGCPhaseManager > class and replaces it with ConcurrentGCBreakpoints. > > This is joint work with Per Liden. > > This change provides a client API, used by WhiteBox. The usage model > for a client is > > (1) Acquire control of concurrent collection cycles. > > (2) Do work that must be performed while the collection cycle is in a > known state. > > (3) Request the concurrent collector run to a named "breakpoint", or > run to completion, and then hold there, waiting for further commands. > > (4) Optionally goto (2). > > (5) Release control of concurrent collection cycles. > > Tests have been updated to use the new WhiteBox API. > > This change provides implementations of the new mechanism for G1 and > ZGC. A Shenandoah implementation is being left to others, but we > don't see any obvious reason for it to be difficult. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240239 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ Looks good in general. But I have several minor nits. ------------------ src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp ?215???????? // Pause Remark. - Pre-existing: this comment should be moved to before? line 221. 221???????? CMRemark cl(_cm); ?216???????? ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); ?217???????? log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. ------------------ src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp 118?? static void at(const char* breakpoint); - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. ------------------ test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java ?138???????????????? throw new RuntimeException("Expected support"); - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. ------------------ For the record. I asked Kim about better alternative for 'const char*' at ConcurrentGCBreakpoints::run_to(const char* breakpoint) and ConcurrentGCBreakpoints::at(const char* breakpoint) something like static member or enum type. The reason is that such string will locate several places and there is already static member in WhiteBox.java. However, the breakpoint may vary among collectors and it is open set. And currently there are only 2 breakpoints, so Kim(and maybe Per) decided just not think hard about it. I am fine with it too. Thanks, Sangheon > > To possibly simplify the review, the open patch is also provided as a > pair of patches, one for removing the old mechanism and a second to > add the new mechanism. > > https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ > Removes ConcurrentGCPhaseManager and its G1 implementation, except > that tests are not modifed. > > https://cr.openjdk.java.net/~kbarrett/8240239/control.03/ > Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and > updates tests to use it. > > Testing: > mach5 tier1-5, which includes all the updated and new tests. > From ivan.walulya at oracle.com Fri Mar 6 08:38:04 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:38:04 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References: Message-ID: Thanks kim! > > But there's also the question of why the log message mentions the > number of free regions at all, since the number of pre-existing free > regions isn't important because of the abandonment. I will remove the number of free regions from the log entry and then set back the previous ordering. > On 6 Mar 2020, at 02:38, Kim Barrett wrote: > >> On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ > > I think I'd prefer the old ordering, but capture num_free_regions() > into a variable before the abandon, and use that variable in the logging. > > But there's also the question of why the log message mentions the > number of free regions at all, since the number of pre-existing free > regions isn't important because of the abandonment. > From ivan.walulya at oracle.com Fri Mar 6 08:38:29 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:38:29 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> Message-ID: <80098CA4-0B79-4A15-A4F8-1F31B6DBF5D6@oracle.com> Thanks Thomas! > On 5 Mar 2020, at 12:11, Thomas Schatzl wrote: > > Hi, > > On 05.03.20 11:32, Ivan Walulya wrote: >> Hi all, >> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. > > Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >> //Ivan > > looks good. Please backport to 14u too. > > Thanks, > Thomas From ivan.walulya at oracle.com Fri Mar 6 08:39:08 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 09:39:08 +0100 Subject: RFR(XS): 8240589: OtherRegionsTable::_num_occupied not updated correctly during coarsening In-Reply-To: <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> References: <0c870725-6dd8-fecc-d0fb-ce5bf3f2cbbc@oracle.com> <5BE42325-52E6-4B81-BD62-A9939E5D1131@oracle.com> Message-ID: <46606E7A-7A60-4C8F-A7F8-0FE1083E40BF@oracle.com> Thanks Stefan! > On 5 Mar 2020, at 21:15, Stefan Johansson wrote: > > > >> 5 mars 2020 kl. 12:11 skrev Thomas Schatzl : >> >> Hi, >> >> On 05.03.20 11:32, Ivan Walulya wrote: >>> Hi all, >>> Please review this small change which fixes that OtherRegionsTable::_num_occupied are updated correctly during coarsening. >> >> Please remove the "during coarsening" part from the CR title; the issue affects regular addition of remembered set entries to the fine prt too. >> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240589 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240589/00/ >>> //Ivan >> >> looks good. Please backport to 14u too. > Looks good to me too, > Stefan > >> >> Thanks, >> Thomas From stefan.johansson at oracle.com Fri Mar 6 08:51:06 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 09:51:06 +0100 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References: Message-ID: Hi, On 2020-03-06 09:38, Ivan Walulya wrote: > Thanks kim! >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. > > I will remove the number of free regions from the log entry and then set back the previous ordering. Sounds good to me as well, reviewed, Stefan > >> On 6 Mar 2020, at 02:38, Kim Barrett wrote: >> >>> On Mar 5, 2020, at 5:37 AM, Ivan Walulya wrote: >>> >>> Hi all, >>> >>> Please review a small modification to fix logging during HeapRegionManager::rebuild_free_list. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240592 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240592/00/ >> >> I think I'd prefer the old ordering, but capture num_free_regions() >> into a variable before the abandon, and use that variable in the logging. >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. >> > From stefan.johansson at oracle.com Fri Mar 6 09:10:18 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 10:10:18 +0100 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: Hi Kim, On 2020-03-04 03:17, Kim Barrett wrote: > Please review this trivial change to remove the useless call to flush() from > the G1DirtyCardQueue destructor. See the CR for more details. This removes > the need for a non-trivial destructor for that class. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ Would it make sense to add an assert into the destructor to ensure no entries were added? Or is that problematic for some reason. If you prefer not to, I'm good with this change and you can consider it reviewed. Cheers, Stefan > > Testing: > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with this and the proposed JDK-8239825 change. > From stefan.johansson at oracle.com Fri Mar 6 10:59:16 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 6 Mar 2020 11:59:16 +0100 Subject: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects In-Reply-To: References: Message-ID: Hi Liang, Thanks for picking this up, really nice to see it progressing. It would be nice if we could make the clearing concurrently to avoid prolonging the pause. An alternative to abort like you do now, would be to let the concurrent cycle start, but have it abort it self directly. This should be done by calling: G1ConcurrentMark::concurrent_cycle_abort() This would also reuse the abort mechanism already in place and if aborting needs updating in the future there is only one place to change. There might be some things that have to be altered to get this to work and I haven't explored this more than in theory. Would you consider trying this out? I'm thinking this should look something like this in the log: GC(1) Pause Young (Concurrent Start) (G1 Evacuation Pause) 261M->262M(502M) 50.153ms GC(2) Concurrent Cycle GC(2) Concurrent Mark Abort GC(2) Concurrent Cycle 12.345ms We might want to call it something other than "Abort" in the logs to differ it from an abort by a Full GC, but we can discuss the details later on. Thanks, Stefan On 2020-03-05 08:13, Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > From maoliang.ml at alibaba-inc.com Fri Mar 6 11:35:17 2020 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Fri, 06 Mar 2020 19:35:17 +0800 Subject: =?UTF-8?B?UkZSIChTKTogODI0MDU1NjogQWJvcnQgY29uY3VycmVudCBtYXJrIGFmdGVyIGVmZmVjdGl2?= =?UTF-8?B?ZSBlYWdlciByZWNsYW1hdGlvbiBvZiBodW1vbmdvdXMgb2JqZWN0cw==?= Message-ID: Hi, Thanks for Man's accurate comments and I made the change http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev.1/ Stefan's concern is fairly reasonable since I have noticed if GC workers are not enough, the addition pause time caused by clearing could be considerable. concurrent_cycle_abort might not be easily to reuse because it still clears the bitmap in pause. I was thinking to let the concurrent mark thread continue and finish the last step of "_cm->cleanup_for_next_mark()" although it has chance to delay the next initial mark. Anyway I'm glad to make a try and you guys can compare two approaches and provide comments. Thanks, Liang ------------------------------------------------------------------ From:Stefan Johansson Send Time:2020 Mar. 6 (Fri.) 18:59 To:"MAO, Liang" ; Thomas Schatzl ; Man Cao ; hotspot-gc-dev Subject:Re: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects Hi Liang, Thanks for picking this up, really nice to see it progressing. It would be nice if we could make the clearing concurrently to avoid prolonging the pause. An alternative to abort like you do now, would be to let the concurrent cycle start, but have it abort it self directly. This should be done by calling: G1ConcurrentMark::concurrent_cycle_abort() This would also reuse the abort mechanism already in place and if aborting needs updating in the future there is only one place to change. There might be some things that have to be altered to get this to work and I haven't explored this more than in theory. Would you consider trying this out? I'm thinking this should look something like this in the log: GC(1) Pause Young (Concurrent Start) (G1 Evacuation Pause) 261M->262M(502M) 50.153ms GC(2) Concurrent Cycle GC(2) Concurrent Mark Abort GC(2) Concurrent Cycle 12.345ms We might want to call it something other than "Abort" in the logs to differ it from an abort by a Full GC, but we can discuss the details later on. Thanks, Stefan On 2020-03-05 08:13, Liang Mao wrote: > Hi All, > > Now we have the bug id. I did more test to the patch. There's > a little concern in the patch that when we decide to cancle > the concurrent cycle in initial mark pause we need to clear > the next bitmap which supposes to be cleared concurrently. > In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10, > the time spent on clearing next bitmap was consistently less > than 10ms. So I guess it could be acceptable. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240556 > Webrev: > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > > ------------------------------------------------------------------ > From:MAO, Liang > Send Time:2020 Mar. 3 (Tue.) 19:14 > To:Thomas Schatzl ; Man Cao ; hotspot-gc-dev > Subject:G1: Abort concurrent at initial mark pause > > Hi All, > > As previous discusion, there're several ideas to improve the humongous > objects handling. We've made some experiments that canceling concurrent > mark at initial mark pause is proved to be effective in the senario that > frequent temporary humongous objects allocation leads to frequent concurrent > mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is > also the exact case but not GC sensative so there's little difference > in score. > > The patch is small and shall we have a bug id for it? > http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/ > > Thanks, > Liang > > > > > From shade at redhat.com Fri Mar 6 12:01:59 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 6 Mar 2020 13:01:59 +0100 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240671 Webrev: https://cr.openjdk.java.net/~shade/8240671/webrev.01/ Tour of changes: *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees that we list the roots in the same order! Also makes the macro itself much shorter. *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use of ShenandoahWorkerTimingsTracker! *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and ShenandoahPhaseTimings interface. *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to shenandoahPhaseTimings.*, as it does not carry its own weight at this point. shenandoahPhaseTimings.* would be renamed at some point in the future. Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From thomas.schatzl at oracle.com Fri Mar 6 12:18:29 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 6 Mar 2020 13:18:29 +0100 Subject: RFR (S) 8240440: Implement get_safepoint_workers() for parallel GC In-Reply-To: References: Message-ID: <3eca0545-f206-110a-0aa8-7f95669cd49a@oracle.com> Hi, On 05.03.20 15:40, Thomas Schatzl wrote: > Hi Ralf, > > On 05.03.20 14:29, Schmelter, Ralf wrote: >> Hi, >> >> could you review the small change. It implements >> get_safepoint_workers() for the ParallelScavengeHeap, so that the >> worker threads could be used for other tasks. This is already >> implemented for G1, Z and Shenandoah. Since the parallel GC does used >> the worker threads only in the collection VM operation it can safely >> share them. >> >> bugreport: https://bugs.openjdk.java.net/browse/JDK-8240440 >> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8240440/webrev.0/ >> > > ? looks good to me. Let me run it through testing. > hs-tier1-5 look good. Ship it. Thanks, Thomas From ivan.walulya at oracle.com Fri Mar 6 12:35:16 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 6 Mar 2020 13:35:16 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Message-ID: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Hi all, Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ Testing: Tier 1 - 3 //Ivan From rkennke at redhat.com Fri Mar 6 12:48:02 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 6 Mar 2020 13:48:02 +0100 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings In-Reply-To: References: Message-ID: Very good! That should make that code less error-prone and more consistent. Change looks good! Thank you, Roman On 3/6/20 1:01 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240671 > > Webrev: > https://cr.openjdk.java.net/~shade/8240671/webrev.01/ > > Tour of changes: > > *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees > that we list the roots in the same order! Also makes the macro itself much shorter. > > *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into > ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use > of ShenandoahWorkerTimingsTracker! > > *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and > ShenandoahPhaseTimings interface. > > *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to > shenandoahPhaseTimings.*, as it does not carry its own weight at this point. > shenandoahPhaseTimings.* would be renamed at some point in the future. > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From zgu at redhat.com Fri Mar 6 15:17:50 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Mar 2020 10:17:50 -0500 Subject: RFR (M) 8240671: Shenandoah: refactor ShenandoahPhaseTimings In-Reply-To: References: Message-ID: <99567ddf-ff3d-d7fb-89a9-ff81ccefebf9@redhat.com> Nice cleanup. Looks good to me. Thanks, -Zhengyu On 3/6/20 7:01 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240671 > > Webrev: > https://cr.openjdk.java.net/~shade/8240671/webrev.01/ > > Tour of changes: > > *) SHENANDOAH_GC_PHASE_DO macro now uses the sub-macro root definition block that now guarantees > that we list the roots in the same order! Also makes the macro itself much shorter. > > *) ShenandoahWorkerTimings middle-man is eliminated by inlining straight into > ShenandoahPhaseTimings. This removes some redundant jumping around. Plus, eliminates it at every use > of ShenandoahWorkerTimingsTracker! > > *) ShenandoahGCPhase is now responsible for measuring the time, which simplifies _timing_data and > ShenandoahPhaseTimings interface. > > *) shenandoahTimingTracker.* are gone, ShenandoahWorkerTimingsTracker implementation moved to > shenandoahPhaseTimings.*, as it does not carry its own weight at this point. > shenandoahPhaseTimings.* would be renamed at some point in the future. > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From kim.barrett at oracle.com Fri Mar 6 17:50:06 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 12:50:06 -0500 Subject: RFR[T]: 8240133: G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: > On Mar 6, 2020, at 4:10 AM, Stefan Johansson wrote: > > Hi Kim, > > On 2020-03-04 03:17, Kim Barrett wrote: >> Please review this trivial change to remove the useless call to flush() from >> the G1DirtyCardQueue destructor. See the CR for more details. This removes >> the need for a non-trivial destructor for that class. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240133/open.00/ > Would it make sense to add an assert into the destructor to ensure no entries were added? Or is that problematic for some reason. ~PtrQueue() already asserts _buf == NULL. > If you prefer not to, I'm good with this change and you can consider it reviewed. Thanks. From kim.barrett at oracle.com Fri Mar 6 17:52:34 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 12:52:34 -0500 Subject: RFR(XS): 8240592: HeapRegionManager::rebuild_free_list logs 0s for the estimated free regions before In-Reply-To: References: Message-ID: <40722D8C-4F2F-4E9F-9A27-889A19CF0C79@oracle.com> > On Mar 6, 2020, at 3:38 AM, Ivan Walulya wrote: > > Thanks kim! >> >> But there's also the question of why the log message mentions the >> number of free regions at all, since the number of pre-existing free >> regions isn't important because of the abandonment. > > I will remove the number of free regions from the log entry and then set back the previous ordering. Sounds good. From kim.barrett at oracle.com Fri Mar 6 22:30:38 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 17:30:38 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> Message-ID: <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> > On Mar 6, 2020, at 2:40 AM, sangheon.kim at oracle.com wrote: > > Hi Kim, > > On 2/28/20 1:48 PM, Kim Barrett wrote: >> Please review this change which removes the ConcurrentGCPhaseManager >> class and replaces it with ConcurrentGCBreakpoints. >> >> This is joint work with Per Liden. >> >> This change provides a client API, used by WhiteBox. The usage model >> for a client is >> >> (1) Acquire control of concurrent collection cycles. >> >> (2) Do work that must be performed while the collection cycle is in a >> known state. >> >> (3) Request the concurrent collector run to a named "breakpoint", or >> run to completion, and then hold there, waiting for further commands. >> >> (4) Optionally goto (2). >> >> (5) Release control of concurrent collection cycles. >> >> Tests have been updated to use the new WhiteBox API. >> >> This change provides implementations of the new mechanism for G1 and >> ZGC. A Shenandoah implementation is being left to others, but we >> don't see any obvious reason for it to be difficult. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240239 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ > Looks good in general. Thanks. > But I have several minor nits. > > ------------------ > src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp > 215 // Pause Remark. > - Pre-existing: this comment should be moved to before line 221. I don't think the comment should be moved. The intervening stuff is all related to the remark pause, and in particular that it demarcates the completion of concurrent marking. > 221 CMRemark cl(_cm); > > 216 ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); > 217 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", > - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. I don't think it matters much. Note that we're also including the time spent waiting on MMU. The delay caused by being stopped at a breakpoint obviously affects timing, one way or another. > src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp > 118 static void at(const char* breakpoint); > - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. I think simply "at" has the desired meaning; paraphrasing from a dictionary "expressing arrival in a particular place or position". "reached_at" would be redundant. The meaning I think you are referring to is from the idiom container.at(element_designator), where "at" is short for something like "reference/value at designated location". > test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java > 138 throw new RuntimeException("Expected support"); > - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. Sorry, but I'm not understanding the issue? In this case we expected the current collector to support concurrent GC breakpoints, but it doesn't, so we report a test failure that we expected support. In the other case we expected the current collector to not support breakpoints, but found that it claims that it does, so we report a test failure that we have unexpected support. Both of these indicate a mismatch between the expectations of the test and the capabilities of the collector. From sangheon.kim at oracle.com Fri Mar 6 23:05:18 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 6 Mar 2020 15:05:18 -0800 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> Message-ID: <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> On 3/6/20 2:30 PM, Kim Barrett wrote: >> On Mar 6, 2020, at 2:40 AM, sangheon.kim at oracle.com wrote: >> >> Hi Kim, >> >> On 2/28/20 1:48 PM, Kim Barrett wrote: >>> Please review this change which removes the ConcurrentGCPhaseManager >>> class and replaces it with ConcurrentGCBreakpoints. >>> >>> This is joint work with Per Liden. >>> >>> This change provides a client API, used by WhiteBox. The usage model >>> for a client is >>> >>> (1) Acquire control of concurrent collection cycles. >>> >>> (2) Do work that must be performed while the collection cycle is in a >>> known state. >>> >>> (3) Request the concurrent collector run to a named "breakpoint", or >>> run to completion, and then hold there, waiting for further commands. >>> >>> (4) Optionally goto (2). >>> >>> (5) Release control of concurrent collection cycles. >>> >>> Tests have been updated to use the new WhiteBox API. >>> >>> This change provides implementations of the new mechanism for G1 and >>> ZGC. A Shenandoah implementation is being left to others, but we >>> don't see any obvious reason for it to be difficult. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8240239 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8240239/open.03/ >> Looks good in general. > Thanks. > >> But I have several minor nits. >> >> ------------------ >> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp >> 215 // Pause Remark. >> - Pre-existing: this comment should be moved to before line 221. > I don't think the comment should be moved. The intervening stuff is > all related to the remark pause, and in particular that it demarcates > the completion of concurrent marking. OK > > >> 221 CMRemark cl(_cm); >> >> 216 ConcurrentGCBreakpoints::at("BEFORE MARKING COMPLETED"); >> 217 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms", >> - Do we need to add time spent by 'at'? If we need time spent on 'at', it would be better to separate the log. > I don't think it matters much. Note that we're also including the > time spent waiting on MMU. The delay caused by being stopped at a > breakpoint obviously affects timing, one way or another. OK > >> src/hotspot/share/gc/shared/concurrentGCBreakpoints.hpp >> 118 static void at(const char* breakpoint); >> - Don't we need more explanatory name? Something like reached_at? To me 'at' make me feel like the function would return none-void type. But this is my preference, so okay as is. > I think simply "at" has the desired meaning; paraphrasing from a > dictionary "expressing arrival in a particular place or position". > "reached_at" would be redundant. The meaning I think you are > referring to is from the idiom container.at(element_designator), > where "at" is short for something like "reference/value at designated > location". OK > >> test/hotspot/jtreg/gc/TestConcurrentGCBreakpoints.java >> 138 throw new RuntimeException("Expected support"); >> - Better explanation please as it is a bit confusing (at least) to me. I feel affirmative sentence seems not good for the message here. Maybe because I compared the message with the other case. > Sorry, but I'm not understanding the issue? In this case we expected > the current collector to support concurrent GC breakpoints, but it > doesn't, so we report a test failure that we expected support. In the > other case we expected the current collector to not support > breakpoints, but found that it claims that it does, so we report a > test failure that we have unexpected support. Both of these indicate a > mismatch between the expectations of the test and the capabilities of > the collector. > > I think I understand the exception, but I was feeling 'Unexpected un-support blah blah' or more detail saying 'current GC supports BP but WhiteBox check returned un-support blah blah' kind of message seem better explanation. But at least double negative (first example) would make more confused. I had a chat with Kim and now I agree with all of his comments. Sorry for noisy comments. Looks good to me as is. Thanks, Sangheon From kim.barrett at oracle.com Fri Mar 6 23:10:46 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Mar 2020 18:10:46 -0500 Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager In-Reply-To: <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> References: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com> <96E048D0-72A0-4B2E-B785-2CCAD28EFF30@oracle.com> <9aa9dd48-76fa-15da-931a-320868c4c629@oracle.com> Message-ID: <3C6C537F-76D8-4900-ADA1-B58DCACFECCB@oracle.com> > On Mar 6, 2020, at 6:05 PM, sangheon.kim at oracle.com wrote: > Looks good to me as is. > > Thanks, > Sangheon Thanks. From kim.barrett at oracle.com Mon Mar 9 07:38:00 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 03:38:00 -0400 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush Message-ID: Please review this backout of JDK-8240133, which turns out to have problems and needs a bit of a rethink; see JDK-8240722 for details. CR: [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 [original] https://bugs.openjdk.java.net/browse/JDK-8240133 Webrev: https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ Testing: Local build. From stefan.johansson at oracle.com Mon Mar 9 07:59:17 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 9 Mar 2020 08:59:17 +0100 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush In-Reply-To: References: Message-ID: <38795206-9801-29ab-f328-686b7086b800@oracle.com> Looks good, StefanJ On 2020-03-09 08:38, Kim Barrett wrote: > Please review this backout of JDK-8240133, which turns out to have > problems and needs a bit of a rethink; see JDK-8240722 for details. > > CR: > [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 > [original] https://bugs.openjdk.java.net/browse/JDK-8240133 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ > > Testing: > Local build. > > From kim.barrett at oracle.com Mon Mar 9 08:04:21 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 04:04:21 -0400 Subject: RFR: 8240722: [BACKOUT] G1DirtyCardQueue destructor has useless flush In-Reply-To: <38795206-9801-29ab-f328-686b7086b800@oracle.com> References: <38795206-9801-29ab-f328-686b7086b800@oracle.com> Message-ID: > On Mar 9, 2020, at 3:59 AM, Stefan Johansson wrote: > > Looks good, > StefanJ Thanks. > On 2020-03-09 08:38, Kim Barrett wrote: >> Please review this backout of JDK-8240133, which turns out to have >> problems and needs a bit of a rethink; see JDK-8240722 for details. >> CR: >> [backout] https://bugs.openjdk.java.net/browse/JDK-8240722 >> [original] https://bugs.openjdk.java.net/browse/JDK-8240133 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8240722/open.00/ >> Testing: >> Local build. From magnus.ihse.bursie at oracle.com Mon Mar 9 08:33:20 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 9 Mar 2020 09:33:20 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc Message-ID: <698c6117-f8c0-191e-9efb-41b5fd447961@oracle.com> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 /Magnus From david.holmes at oracle.com Mon Mar 9 09:10:57 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Mar 2020 19:10:57 +1000 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> Message-ID: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Hi Magnus, On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: > When reworking the JVM feature handling, I wanted to try to compile > Hotspot with various features enabled/disabled. I quickly found out that > it's not really possible to build hotspot without the serial gc. While > this is not a terribly important use case, I think it's good to be able > to select serial freely, just as with the other collectors. Really not sure this is a worthwhile exercise. > With this patch it is possible to build a truly minimal JVM using > 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 make/ModuleTools.gmk ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) $(JAVA_FLAGS_SMALL_BUILDJDK) \ that should be BUILDJDK_JAVA_FLAGS_SMALL. make/RunTestsPrebuiltSpec.gmk make/autoconf/boot-jdk.m4 ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 Depending on the default GC those -Xms and -Xmx settings may not be valid/possible. Other changes seem okay but I'll leave it for GC folk to comment on that. Cheers, David > > /Magnus From shade at redhat.com Mon Mar 9 09:20:43 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 10:20:43 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: On 3/9/20 10:10 AM, David Holmes wrote: > On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >> When reworking the JVM feature handling, I wanted to try to compile >> Hotspot with various features enabled/disabled. I quickly found out that >> it's not really possible to build hotspot without the serial gc. While >> this is not a terribly important use case, I think it's good to be able >> to select serial freely, just as with the other collectors. > > Really not sure this is a worthwhile exercise. Me neither. I think Serial GC always-present is a good compromise for the rest of the code: it is the very basic GC you can always count on. Nits: *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: - Epsilon should not ever be selected by ergonomics - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp - I don't think we indent nested #include, #define lines -- Thanks, -Aleksey From thomas.schatzl at oracle.com Mon Mar 9 11:00:24 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 9 Mar 2020 12:00:24 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Message-ID: <9d4f6c6d-7255-a0b6-3d34-a53fec932ab6@oracle.com> Hi, On 06.03.20 13:35, Ivan Walulya wrote: > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan > looks good. Please also remove the paragraph in the comment in heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Thanks, Thomas From ivan.walulya at oracle.com Mon Mar 9 11:38:52 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Mon, 9 Mar 2020 04:38:52 -0700 (PDT) Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Message-ID: <28446e64-870f-449d-a7f2-f9f6bce6956c@default> Thanks Thomas! > Please also remove the paragraph in the comment in > heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Noted. //Ivan ----- Original Message ----- From: thomas.schatzl at oracle.com To: hotspot-gc-dev at openjdk.java.net Sent: Monday, 9 March, 2020 12:00:44 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna Subject: Re: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more Hi, On 06.03.20 13:35, Ivan Walulya wrote: > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan > looks good. Please also remove the paragraph in the comment in heapRegionRemSet.hpp:253. We do not scrub/delete since jdk11 any more. Thanks, Thomas From shade at redhat.com Mon Mar 9 13:18:58 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 14:18:58 +0100 Subject: RFR (S) 8240749: Shenandoah: refactor ShenandoahUtils Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240749 Webrev: https://cr.openjdk.java.net/~shade/8240749/webrev.01/ It mostly hides naked phase_timings()->record... calls with ShenandoahGCWorkerPhase wrapper. But also cleans up the code a bit. Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From shade at redhat.com Mon Mar 9 13:20:11 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 9 Mar 2020 14:20:11 +0100 Subject: RFR (S) 8240750: Shenandoah: remove leftover files and mentions of ShenandoahAllocTracker Message-ID: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8240750 While working on JDK-8240215, I totally forgot to remove these leftovers. Webrev: https://cr.openjdk.java.net/~shade/8240750/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 9 14:03:08 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 9 Mar 2020 15:03:08 +0100 Subject: RFR (S) 8240749: Shenandoah: refactor ShenandoahUtils In-Reply-To: References: Message-ID: <299fe138-7c85-2443-b485-5d19a4d3b375@redhat.com> Looks good to me. Thanks! Roman On 3/9/20 2:18 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240749 > > Webrev: > https://cr.openjdk.java.net/~shade/8240749/webrev.01/ > > It mostly hides naked phase_timings()->record... calls with ShenandoahGCWorkerPhase wrapper. But > also cleans up the code a bit. > > Testing: hotspot_gc_shenandoah > From rkennke at redhat.com Mon Mar 9 14:03:38 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 9 Mar 2020 15:03:38 +0100 Subject: RFR (S) 8240750: Shenandoah: remove leftover files and mentions of ShenandoahAllocTracker In-Reply-To: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> References: <04a65c11-8550-fc3c-a613-a324cddb5a17@redhat.com> Message-ID: Yep, looks good! Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240750 > > While working on JDK-8240215, I totally forgot to remove these leftovers. > > Webrev: > https://cr.openjdk.java.net/~shade/8240750/webrev.01/ > > Testing: hotspot_gc_shenandoah > From per.liden at oracle.com Mon Mar 9 15:03:33 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 9 Mar 2020 16:03:33 +0100 Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal In-Reply-To: <73577b98-c59e-9e80-b966-a11d501952d8@oracle.com> References: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com> <73577b98-c59e-9e80-b966-a11d501952d8@oracle.com> Message-ID: <52166bda-a3c7-5c61-92ca-4ba8e05fa4ed@oracle.com> Hi, On 2/27/20 10:13 AM, Stefan Johansson wrote: > Hi Erik, > > On 2020-02-26 18:28, Erik Gahlin wrote: >> Hi Stefan, >> >> GC-id would be nice, but perhaps not possible in all scenarios, i.e. >> -XX:+ExplicitGCInvokesConcurrent and Epsilon GC? > For ExplicitGCInvokesConcurrent it would not be a big problem, that > would start a concurrent cycle and we could use the id for that GC. I > also realized that we can get the GC-id without any problem. For other > events sent before the GC-id is properly setup, we use GCId::peek() > which returns the id that will be used for the next collection. I have to say that I don't think the GC-id is all the important/interesting here. Especially, since that ID can be a bit sketchy depending on the GC and/or configuration. cheers, Per > > For Epsilon, I'm not sure an event should be sent at all since they are > blocked, see: EpsilonHeap::collect(...) > > Thanks, > Stefan > >> >> Thanks >> Erik >> >> On 2020-02-26 14:21, Stefan Johansson wrote: >>> Hi Erik, >>> >>>> 26 feb. 2020 kl. 13:56 skrev Per Liden : >>>> >>>> Hi Erik, >>>> >>>> On 2020-02-26 13:50, Erik Gahlin wrote: >>>>> Hi, >>>>> Could I have a review of a JFR event that is emitted when >>>>> System.gc() is called. >>>>> Purpose is to collect the stack trace. It is not sufficient with >>>>> the cause field that the GarbageCollection event has today. >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8003216 >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~egahlin/8003216/ >>>> 489???? EventSystemGC event; >>>> 490???? event.commit(); >>>> 491???? Universe::heap()->collect(GCCause::_java_lang_system_gc); >>>> >>>> Don't you want the commit() call after the call to collect(), to get >>>> the timing right? >>> I was thinking the same thing, could also be nice to have the GC-id >>> associated with the event to make it easy to match it to GC-logs and >>> other GC-events. Not sure how to easily get the GC-id though, since >>> it?s not set at the time we commit the event. >>> >>> I guess if the event has the correct span with timestamps it will be >>> easy to figure out which other events are associated with it, even >>> without the GC-id. >>> >>> Cheers, >>> Stefan >>> >>>> cheers, >>>> Per >>>> >>>>> Testing: >>>>> tier1+tier2+jdk/jdk/jfr >>>>> Thanks >>>>> Erik From erik.joelsson at oracle.com Mon Mar 9 15:28:00 2020 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 9 Mar 2020 08:28:00 -0700 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> On 2020-03-09 02:20, Aleksey Shipilev wrote: > On 3/9/20 10:10 AM, David Holmes wrote: >> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>> When reworking the JVM feature handling, I wanted to try to compile >>> Hotspot with various features enabled/disabled. I quickly found out that >>> it's not really possible to build hotspot without the serial gc. While >>> this is not a terribly important use case, I think it's good to be able >>> to select serial freely, just as with the other collectors. >> Really not sure this is a worthwhile exercise. > Me neither. I think Serial GC always-present is a good compromise for the rest of the code: it is > the very basic GC you can always count on. I'm not a GC developer, but from a build point of view, it makes sense to allow for as free modularity of JVM features as possible. Certainly not all combinations are a good idea, and we are most definitely not going to test all combinations, but I also don't think the build should actively prevent anyone from experimentally exclude certain "features". I would imagine this kind of freedom being useful in certain development scenarios. > Nits: > > *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: > - Epsilon should not ever be selected by ergonomics > - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] This fallback list is clearly just meant to allow for any combination of GCs being compiled into the JVM. If the only one you picked was epsilon, then what other default would you expect? It's last in the list so any other GC will still be prioritized before it if present. For the same reason, the order of ZGC and Shenandoah is irrelevant and could just as well be the other way. It will never have any real consequence. This code is only there to keep things from falling apart when a non standard combination of jvm features is picked. /Erik > *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp > - I don't think we indent nested #include, #define lines > From kim.barrett at oracle.com Mon Mar 9 20:37:27 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 16:37:27 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> Message-ID: <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> > On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie wrote: > > When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. > > With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 > WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 > > /Magnus I'm inclined to agree with David and Aleksey that this isn't really a worthwhile exercise. Especially not if it involves making some otherwise questionable or controversial changes. In addition to the issues mentioned by David and Aleksey: ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/gcConfig.cpp I would instead suggest there should not be a default at all instead of adding these cases, and the user must explicitly select the GC to be used. Since we're talking about an atypical custom build anyway, the user presumably knows what they are doing. And yeah, that makes the buildjdk stuff elsewhere in this patch harder. Really, I think this ought to just be left alone, along with most of the other build-specific changes. [This also responds to / agrees with Aleksey's comment about this part.] ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/genCollectedHeap.cpp 197 #if INCLUDE_SERIALGC 198 MarkSweep::initialize(); 199 #endif This whole file, and several associated files, are *only* used by SerialGC now that CMS has been removed: JDK-8234502. ------------------------------------------------------------------------------ make/hotspot/lib/JvmFeatures.gmk 58 ifeq ($(JVM_VARIANT), custom) 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" 60 endif This change looks unrelated to whether serialgc is present or absent. If so, it doesn't belong in this changeset at all. ------------------------------------------------------------------------------ make/hotspot/lib/JvmFeatures.gmk [removed] 154 # If serial is disabled, we cannot use serial as OldGC in parallel 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp This was missed by JDK-8235860, which removed those files. Good find. ------------------------------------------------------------------------------ test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp As originally written, this test was *only* testing SerialGC. It's not obvious that it is actually GC-agnostic and can use the default GC if that isn't SerialGC. Certainly some of the naming suggests otherwise. Was this tested with all the other configurations? ------------------------------------------------------------------------------ From kim.barrett at oracle.com Tue Mar 10 01:16:40 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Mar 2020 21:16:40 -0400 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> Message-ID: <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> > On Mar 6, 2020, at 7:35 AM, Ivan Walulya wrote: > > Hi all, > > Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 > webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ > > Testing: Tier 1 - 3 > > //Ivan Looks good. From ivan.walulya at oracle.com Tue Mar 10 08:00:56 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 10 Mar 2020 09:00:56 +0100 Subject: RFR: 8240668 : G1 list of all PerRegionTable does not have to be a double linkedlist any more In-Reply-To: <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> References: <0650B450-17DB-46D4-ABD6-A1CD1877C53A@oracle.com> <88D38C7C-105F-4CA0-A8FE-789B4A847234@oracle.com> Message-ID: <7096FE73-1D6A-48F5-B8C7-F83229C01430@oracle.com> Thanks Kim > On 10 Mar 2020, at 02:16, Kim Barrett wrote: > >> On Mar 6, 2020, at 7:35 AM, Ivan Walulya wrote: >> >> Hi all, >> >> Please review this modification to change the list of all PerRegionTables from a double linkedlist to a linkedlist. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240668 >> webrev: http://cr.openjdk.java.net/~iwalulya/8240668/00/ >> >> Testing: Tier 1 - 3 >> >> //Ivan > > Looks good. > From thomas.schatzl at oracle.com Tue Mar 10 09:16:06 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 10 Mar 2020 02:16:06 -0700 (PDT) Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> Message-ID: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Hi, On 05.03.20 11:33, Ivan Walulya wrote: > Hi all, > > Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 > Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ > > > //Ivan > some minor (imo) comments to start a discussion: - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. So overall, I am good with the change but asking for opinions :) Thanks, Thomas From ivan.walulya at oracle.com Tue Mar 10 09:26:34 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 10 Mar 2020 10:26:34 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Message-ID: > > some minor (imo) comments to start a discussion: > > - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. > > - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. Unconditionally clearly the data is better, and yes clearing the ratio data is to avoid expansion based on old data immediately after shrinking. The early exit does not update the windowing data. > > - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. > Noted, this will be changed to trace level. > So overall, I am good with the change but asking for opinions :) > > Thanks, > Thomas Thanks Thomas, //Ivan From magnus.ihse.bursie at oracle.com Tue Mar 10 14:51:39 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:51:39 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> Message-ID: <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> On 2020-03-09 16:28, Erik Joelsson wrote: > > On 2020-03-09 02:20, Aleksey Shipilev wrote: >> On 3/9/20 10:10 AM, David Holmes wrote: >>> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>>> When reworking the JVM feature handling, I wanted to try to compile >>>> Hotspot with various features enabled/disabled. I quickly found out >>>> that >>>> it's not really possible to build hotspot without the serial gc. While >>>> this is not a terribly important use case, I think it's good to be >>>> able >>>> to select serial freely, just as with the other collectors. >>> Really not sure this is a worthwhile exercise. >> Me neither. I think Serial GC always-present is a good compromise for >> the rest of the code: it is >> the very basic GC you can always count on. > I'm not a GC developer, but from a build point of view, it makes sense > to allow for as free modularity of JVM features as possible. Certainly > not all combinations are a good idea, and we are most definitely not > going to test all combinations, but I also don't think the build > should actively prevent anyone from experimentally exclude certain > "features". I would imagine this kind of freedom being useful in > certain development scenarios. Yes, that's exactly the intention. And on the contrary, if the discussion on this patch ends up in the verdict from the hotspot developers that it is not possible to disable serialgc, then the configure script should reflect that, and disallow deselecting it. In fact, it should not really even be a "JVM feature" then, just an always-on GC. And the check that Stefan Karlsson added, that at least one GC is selected, is unnecessary. >> Nits: >> >> *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: >> ? - Epsilon should not ever be selected by ergonomics >> ? - Why ZGC is selected before Shenandoah? [Oh, what a can of worms >> that one is ;)] > > This fallback list is clearly just meant to allow for any combination > of GCs being compiled into the JVM. If the only one you picked was > epsilon, then what other default would you expect? It's last in the > list so any other GC will still be prioritized before it if present. > For the same reason, the order of ZGC and Shenandoah is irrelevant and > could just as well be the other way. It will never have any real > consequence. This code is only there to keep things from falling apart > when a non standard combination of jvm features is picked. Exactly. For good measure, I can surely put Shenandoah before ZGC. :) The idea behind the added defaults as fallback is just to allow the JVM to even start if Serial GC is not present. If you configure with SerialGC (which, as you note, is the typical case), this change will not affect anything. Without this, it is not even possible to complete the build without SerialGC, since jlink cannot run. /Magnus > > /Erik > >> *) hotspot/gtest/gc/shared/test_collectorPolicy.cpp >> ? - I don't think we indent nested #include, #define lines Ok, sorry about that. That was the style of choice last time I programmed something seriously in C, so I just defaulted to it. I'll fix it. /Magnus From magnus.ihse.bursie at oracle.com Tue Mar 10 14:51:54 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:51:54 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: On 2020-03-09 10:10, David Holmes wrote: > Hi Magnus, > > On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >> When reworking the JVM feature handling, I wanted to try to compile >> Hotspot with various features enabled/disabled. I quickly found out >> that it's not really possible to build hotspot without the serial gc. >> While this is not a terribly important use case, I think it's good to >> be able to select serial freely, just as with the other collectors. > > Really not sure this is a worthwhile exercise. While I agree that it is not very much important per se to build Hotspot without the Serial GC, I do want to make sure that we uphold the promise that configure fails early if you try to build with invalid options. So it's either not allowing configure to let you to build without the Serial GC, or it's fixing Hotspot so that it can build without it. My judgement was that the fixes required to make this work was minimal, without any impact to scenarios that *do* include Serial GC, and thus it was "worthwile" to fix this in Hotspot, rather than to make a limitation in the configure script. >> With this patch it is possible to build a truly minimal JVM using >> 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 > > > make/ModuleTools.gmk > > ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) > $(JAVA_FLAGS_SMALL_BUILDJDK) \ > > that should be BUILDJDK_JAVA_FLAGS_SMALL. Good catch! I renamed this at the very last moment, but missed this. :-( > > > make/RunTestsPrebuiltSpec.gmk > make/autoconf/boot-jdk.m4 > > ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 > > Depending on the default GC those -Xms and -Xmx settings may not be > valid/possible. Eh, okaaaay, this is not really something new, we're already setting this for the buildjdk. The only difference is that we mis-used the JAVA_FLAGS_SMALL variable, that was technically only valid for the bootjdk. So we have not seen any issues with this in practice. I'm still a bit worried though that you say that this might not work. How can the -Xms/mx values be invalid? > > Other changes seem okay but I'll leave it for GC folk to comment on that. Thanks for the review! /Magnus > > Cheers, > David > > >> >> /Magnus From magnus.ihse.bursie at oracle.com Tue Mar 10 14:53:31 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 10 Mar 2020 15:53:31 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> Message-ID: On 2020-03-09 21:37, Kim Barrett wrote: >> On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie wrote: >> >> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. >> >> With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >> WebRev: http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >> >> /Magnus > I'm inclined to agree with David and Aleksey that this isn't really a > worthwhile exercise. Especially not if it involves making some > otherwise questionable or controversial changes. As I've said in the previous comments, it's not so much about making Hotspot running without Serial GC as making configure live up to it's promise not to create an un-buildable configuration. I apologize if my changes are questionable or controversial -- my assessment was on the contrary that they were simple and non-obtrusive, to the point of triviality. > > In addition to the issues mentioned by David and Aleksey: > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/gcConfig.cpp > > I would instead suggest there should not be a default at all instead > of adding these cases, and the user must explicitly select the GC to > be used. Since we're talking about an atypical custom build anyway, > the user presumably knows what they are doing. And yeah, that makes > the buildjdk stuff elsewhere in this patch harder. If you build without the Serial GC, it is not even possible to start the JVM without a flag selecting GC. Instead you get a somewhat cryptic (and incorrect) message about missing garbage collectors. Even if the end user would be able to know that you need to pass an additional option just to be able to start java, the build system knows no such thing, so we cannot even finish the build -- as soon as we try to use the newly built JVM (e.g. for running jlink), we will crash and burn. > Really, I think this ought to just be left alone, along with most of > the other build-specific changes. > > [This also responds to / agrees with Aleksey's comment about this part.] > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/genCollectedHeap.cpp > 197 #if INCLUDE_SERIALGC > 198 MarkSweep::initialize(); > 199 #endif > > This whole file, and several associated files, are *only* used by > SerialGC now that CMS has been removed: JDK-8234502. Then maybe they should be excluded when serial is not included? Or, if it is determined that Serial GC is essential to hotspot, we should remove the INCLUDE_SERIALGC define and associated framework, since it's just a fake abstraction if it is not actually possible to build without serial GC. > > ------------------------------------------------------------------------------ > make/hotspot/lib/JvmFeatures.gmk > 58 ifeq ($(JVM_VARIANT), custom) > 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" > 60 endif > > This change looks unrelated to whether serialgc is present or absent. > If so, it doesn't belong in this changeset at all. You are correct that this is not strictly about serialgc. When I tested my custom build with only epsilongc, I discovered that jtreg barfed on the version string produced by the custom JVM build. This is a fix that makes sure the VMTYPE always has a value. If you object to me pushing it as part of this fix, I can remove it from here and submit it as a separate issue. (I just didn't think it was worth the hassle.) > > ------------------------------------------------------------------------------ > make/hotspot/lib/JvmFeatures.gmk > [removed] > 154 # If serial is disabled, we cannot use serial as OldGC in parallel > 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp > > This was missed by JDK-8235860, which removed those files. Good find. ... but according to your comment above, that fix also missed to add a bunch of other files that should be excluded..? (If we should keep the ability to disable serial gc, that is...) > > ------------------------------------------------------------------------------ > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp > > As originally written, this test was *only* testing SerialGC. It's not > obvious that it is actually GC-agnostic and can use the default GC if > that isn't SerialGC. Certainly some of the naming suggests otherwise. > Was this tested with all the other configurations? No, I have not tested all other configurations. I verified that I could build with only g1, only zgc and only epsilongc. I also tested to run tier1 testing, and it "mostly" succeeded, but it still failed on several tests. My quick eyeballing of the situation indicated that the absolute majority (and perhaps all) these failures were related to jtreg tests not properly declaring their dependencies on compiler1 or compiler2. (Remember, on this bare-bones JVM I only had the interpreter, and neither c1 nor c2). I *could* of course run a suitable set of testing with say c1 and c2 enabled, and just a single gc enabled, for the set of all gcs != serial gc, but then we're *really* getting into the "not worth it" land. It is not clear to me that the test is only run with Serial GC. As far as I can interpret the test framework, this is run with the default collector, which typically is *not* serialgc on our testing framework. If this is only valid for Serial GC, perhaps the test needs to be amended? /Magnus > > ------------------------------------------------------------------------------ > From per.liden at oracle.com Tue Mar 10 17:19:48 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 10 Mar 2020 18:19:48 +0100 Subject: RFR: 8240714: ZGC: TestSmallHeap.java failed due to OutOfMemoryError Message-ID: <74b2b7a0-fa16-7476-809a-fc550b4827d0@oracle.com> The gc/z/TestSmallHeap.java test failed once due to OutOfMemoryError. When using a 8M heap, this test is fairly sensitive in the sense that the heap will be very crowded and the heap headroom is small. When running as "main/othervm" there are additional jtreg threads running in the VM. These threads can apparently (sometimes?) allocate enough memory to disturb the test itself, pushing it over the edge with OOME as a result. To avoid having these threads running in the same VM as the test itself I've adjusted the test to spawn a new test VM through ProcessTools. Webrev: http://cr.openjdk.java.net/~pliden/8240714/webrev.0 Bug: https://bugs.openjdk.java.net/browse/JDK-8240714 Testing: Manual cheers, Per From sangheon.kim at oracle.com Tue Mar 10 21:17:56 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 10 Mar 2020 14:17:56 -0700 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> Message-ID: <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> Hi Kim, On 3/3/20 6:16 PM, Kim Barrett wrote: > Please review this change to the handling of "padding" for the threshold > used to decide whether a mutator thread should perform concurrent > refinement. Rather than doing a slightly tricky (because of potential > overflow) computation every time a mutator thread completes a buffer, > instead perform that computation once and record the result for repeated > use. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8239825 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ Looks good as is. ------------------- src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp ?326?? // Artificially increase mutator refinement threshold. ?327?? void set_max_cards_padding(size_t padding); - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. ?330?? void discard_max_cards_padding(); - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. ?? Is this intended? I don't need a new webrev for this change if you would like to change it. Thanks, Sangheon > > Testing: > mach5 tier1-5 along with changes for JDK-8240133 and JDK-8139652. > Local (linux-x64) hotspot:tier1 with just this change. > From kim.barrett at oracle.com Tue Mar 10 21:27:50 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 17:27:50 -0400 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: References: Message-ID: > On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: > > Please review this change to the handling of completed buffers by mutator > threads. Previously it would conditionally process and potentially reuse the > buffer, rather than enqueuing it. Now, always enqueue the buffer and > allocate a new one, and conditionally process the next (oldest) dirty buffer > in the DCQS. The benefit of this is that the buffers being processed by the > mutator age for a while in the DCQS (just as is done by for concurrent > refinement thread processing), so if the mutator is making repeated writes > to the same or nearby locations, the associated card marking has more > opportunaty to be filtered out. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8139652 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ > > Testing > mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. The original webrev was based on JDK-8239825 and JDK-8240133. The push and backout of JDK-8240133 has made that webrev no longer apply cleanly. So here's a new, up to date (as of this morning) webrev: https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ Tested with mach5 tier1-5 along with change for JDK-8239825 (which hasn't been pushed yet). I forgot to mention previously that I've also done some performance testing, which didn't find anything interesting from this change. Compared each of before/after this change plus each of default -XX:G1ConcRefinementThreads and that option = 0. From david.holmes at oracle.com Tue Mar 10 23:22:41 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Mar 2020 09:22:41 +1000 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> Message-ID: <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> On 11/03/2020 12:51 am, Magnus Ihse Bursie wrote: > On 2020-03-09 10:10, David Holmes wrote: >> Hi Magnus, >> >> On 9/03/2020 6:30 pm, Magnus Ihse Bursie wrote: >>> When reworking the JVM feature handling, I wanted to try to compile >>> Hotspot with various features enabled/disabled. I quickly found out >>> that it's not really possible to build hotspot without the serial gc. >>> While this is not a terribly important use case, I think it's good to >>> be able to select serial freely, just as with the other collectors. >> >> Really not sure this is a worthwhile exercise. > While I agree that it is not very much important per se to build Hotspot > without the Serial GC, I do want to make sure that we uphold the promise > that configure fails early if you try to build with invalid options. > > So it's either not allowing configure to let you to build without the > Serial GC, or it's fixing Hotspot so that it can build without it. My > judgement was that the fixes required to make this work was minimal, > without any impact to scenarios that *do* include Serial GC, and thus it > was "worthwile" to fix this in Hotspot, rather than to make a limitation > in the configure script. I'm more inclined to say that SerialGC is not a VM feature per-se but rather an always present built-in GC. But I'll leave that to the GC folk. >>> With this patch it is possible to build a truly minimal JVM using >>> 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240224 >>> WebRev: >>> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >> >> >> make/ModuleTools.gmk >> >> ! TOOL_ADD_PACKAGES_ATTRIBUTE := $(BUILD_JAVA) >> $(JAVA_FLAGS_SMALL_BUILDJDK) \ >> >> that should be BUILDJDK_JAVA_FLAGS_SMALL. > Good catch! I renamed this at the very last moment, but missed this. :-( > >> >> >> make/RunTestsPrebuiltSpec.gmk >> make/autoconf/boot-jdk.m4 >> >> ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 >> >> Depending on the default GC those -Xms and -Xmx settings may not be >> valid/possible. > Eh, okaaaay, this is not really something new, we're already setting > this for the buildjdk. The only difference is that we mis-used the > JAVA_FLAGS_SMALL variable, that was technically only valid for the > bootjdk. So we have not seen any issues with this in practice. I'm still > a bit worried though that you say that this might not work. How can the > -Xms/mx values be invalid? Previously these heap sizes were associated with use of SerialGC: ! JAVA_FLAGS_SMALL := -XX:+UseSerialGC -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 ! BUILDJDK_JAVA_FLAGS_SMALL := -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 now you are setting them independent of a particular GC. It may be possible that with some GC's the specified heap size is not sufficient to allow the build task to complete without getting an OutOfMemoryError. As an extreme case consider if you only have EpsilonGC configured. These values would need to be tested with each GC to see if the build tasks can be done with these settings. Also I'm not at all clear what happens if the only GC configured is one of the experimental GCs for which we would normally have to set -XX:+UnlockExperimentalVMOptions ?? Cheers, David ----- >> >> Other changes seem okay but I'll leave it for GC folk to comment on that. > Thanks for the review! > > /Magnus >> >> Cheers, >> David >> >> >>> >>> /Magnus > From kim.barrett at oracle.com Tue Mar 10 23:35:58 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 19:35:58 -0400 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> Message-ID: <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> > On Mar 10, 2020, at 5:17 PM, sangheon.kim at oracle.com wrote: > > Hi Kim, > > On 3/3/20 6:16 PM, Kim Barrett wrote: >> Please review this change to the handling of "padding" for the threshold >> used to decide whether a mutator thread should perform concurrent >> refinement. Rather than doing a slightly tricky (because of potential >> overflow) computation every time a mutator thread completes a buffer, >> instead perform that computation once and record the result for repeated >> use. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8239825 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ > Looks good as is. Thanks, but see below. > src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp > 326 // Artificially increase mutator refinement threshold. > 327 void set_max_cards_padding(size_t padding); > - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. It's called twice by G1ConcurrentRefine::adjust, at g1ConcurrentRefine.cpp:404/406. Those lines didn't need to be changed, because the functional behavior didn't change, just the underlying implementation; see below. > 330 void discard_max_cards_padding(); > - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. > Is this intended? I don't need a new webrev for this change if you would like to change it. I didn't change the name of a data member, I removed one and added the other; those two members have completely different semantics. The old _max_cards_padding was the current amount of padding. The effective threshold was _max_cards + _max_cards_padding, being careful to deal with overflow. The new _padded_max_cards is the effective threshold, recomputed when either _max_cards or the padding value is updated (being careful to deal with overflow at that update time, rather than every time the threshold is needed). It has no accessor functions; it is a private implementation detail, only used internally, where it is used directly as a data member. set_max_cards_padding(new_padding) still changes the current padding value. With this change that padding value is no longer directly recorded in a data member. Instead the padded threshold is computed and recorded in the new data member (_padded_max_cards). The ability to make these kinds of implementation changes without changing the external API is kind of the point of using a functional interface rather than exposing data members to clients. I think the name "set_max_cards_padding" doesn't (and shouldn't) imply anything about the existence (or not) of a _max_cards_padding member. I also don't think the public function name should be changed to "set_padded_max_cards" to reflect the new member name, whose very existence is an implementation detail. The name could perhaps be changed to update_max_cards_padding, but I don't think that's really an improvement. What do others think. From sangheon.kim at oracle.com Wed Mar 11 00:53:57 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 10 Mar 2020 17:53:57 -0700 Subject: RFR: 8239825: G1: Simplify threshold test for mutator refinement In-Reply-To: <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> References: <81A0AF23-EEA2-42A2-8208-AD36B7B336CC@oracle.com> <15e5d4d9-053f-fcab-9ea1-969604bfdc3f@oracle.com> <6E320DC7-54AB-4193-9505-238D0400309B@oracle.com> Message-ID: <874f2180-9024-f9dc-748a-9b51a44fd298@oracle.com> On 3/10/20 4:35 PM, Kim Barrett wrote: >> On Mar 10, 2020, at 5:17 PM, sangheon.kim at oracle.com wrote: >> >> Hi Kim, >> >> On 3/3/20 6:16 PM, Kim Barrett wrote: >>> Please review this change to the handling of "padding" for the threshold >>> used to decide whether a mutator thread should perform concurrent >>> refinement. Rather than doing a slightly tricky (because of potential >>> overflow) computation every time a mutator thread completes a buffer, >>> instead perform that computation once and record the result for repeated >>> use. >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8239825 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8239825/open.00/ >> Looks good as is. > Thanks, but see below. > >> src/hotspot/share/gc/g1/g1DirtyCardQueue.hpp >> 326 // Artificially increase mutator refinement threshold. >> 327 void set_max_cards_padding(size_t padding); >> - This method is not used even 8240133 and 8139652. I'm okay leaving as is if you think it may be used in the future. > It's called twice by G1ConcurrentRefine::adjust, at > g1ConcurrentRefine.cpp:404/406. Those lines didn't need to be changed, > because the functional behavior didn't change, just the underlying > implementation; see below. Right, I thought that method is modified but it isn't. > >> 330 void discard_max_cards_padding(); >> - You changed the member name from '_max_cards_padding' to '_padded_max_cards' but it is not reflected on the method name. >> Is this intended? I don't need a new webrev for this change if you would like to change it. > I didn't change the name of a data member, I removed one and added the > other; those two members have completely different semantics. I don't want argue with this, because removing/adding vs. changing its name/semantics seems same to me. :) > > The old _max_cards_padding was the current amount of padding. The > effective threshold was _max_cards + _max_cards_padding, being careful > to deal with overflow. > > The new _padded_max_cards is the effective threshold, recomputed when > either _max_cards or the padding value is updated (being careful to > deal with overflow at that update time, rather than every time the > threshold is needed). It has no accessor functions; it is a private > implementation detail, only used internally, where it is used directly > as a data member. > > set_max_cards_padding(new_padding) still changes the current padding > value. With this change that padding value is no longer directly > recorded in a data member. Instead the padded threshold is computed > and recorded in the new data member (_padded_max_cards). The ability > to make these kinds of implementation changes without changing the > external API is kind of the point of using a functional interface > rather than exposing data members to clients. > > I think the name "set_max_cards_padding" doesn't (and shouldn't) imply > anything about the existence (or not) of a _max_cards_padding member. > I also don't think the public function name should be changed to > "set_padded_max_cards" to reflect the new member name, whose very > existence is an implementation detail. > > The name could perhaps be changed to update_max_cards_padding, but I > don't think that's really an improvement. What do others think. It is straightforward to me that the old method was just setter so reflected the member name. However if you intended to name/implement it with the concept of a functional interface, that is totally fine with me. Hopefully I'm the only person who is questionable on the method name. :) Thanks, Sangheon > From kim.barrett at oracle.com Wed Mar 11 01:36:16 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 21:36:16 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> Message-ID: <96531866-D45A-4797-841D-3E6E26F403D5@oracle.com> > On Mar 10, 2020, at 10:53 AM, Magnus Ihse Bursie wrote: > > On 2020-03-09 21:37, Kim Barrett wrote: >>> On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie >>> wrote: >>> >>> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. >>> >>> With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8240224 >>> >>> WebRev: >>> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >>> >>> >>> /Magnus >>> >> I'm inclined to agree with David and Aleksey that this isn't really a >> worthwhile exercise. Especially not if it involves making some >> otherwise questionable or controversial changes. >> > > As I've said in the previous comments, it's not so much about making Hotspot running without Serial GC as making configure live up to it's promise not to create an un-buildable configuration. The ability to configure which GCs are present was added for several reasons. Some packagers don't want to support some of the collectors that are available in the source tree, so want to completely exclude the (to them) unsupported collectors from their builds. Some packagers want to be able to reduce the VM footprint for certain application areas; the "minimal" variant is an example. In preparation for removal of CMS it was useful to first be able to build with it configured out. And CMS could have ended up in the category of collectors that are excluded as unsupported by some packagers. The implementation of this configurability tried to be reasonably complete. Doing so helped shake out problems and show the intent. But I don't know if it was ever demonstrated to work for all possibilities, and even if it did at one time, bit rot is pretty much inevitable since we don't test most of those possibilities. I don't think we should be spending effort on configurations for which there is no evidence anyone actually wants or needs them. But having the mechanism in the build system to try a configuration provides a starting point if someone finds a need for something oddball, even if it doesn't work out of the box. It would be better if broken configurations failed nicely, but even that can't be ensured for long without ongoing testing that I don't think anyone wants to do. > I apologize if my changes are questionable or controversial -- my assessment was on the contrary that they were simple and non-obtrusive, to the point of triviality. Some of the discussion in this thread has been pointing out places where a reviewer thinks that assessment is mistaken. >> src/hotspot/share/gc/shared/gcConfig.cpp >> >> I would instead suggest there should not be a default at all instead >> of adding these cases, and the user must explicitly select the GC to >> be used. Since we're talking about an atypical custom build anyway, >> the user presumably knows what they are doing. And yeah, that makes >> the buildjdk stuff elsewhere in this patch harder. >> > > If you build without the Serial GC, it is not even possible to start the JVM without a flag selecting GC. Instead you get a somewhat cryptic (and incorrect) message about missing garbage collectors. Even if the end user would be able to know that you need to pass an additional option just to be able to start java, the build system knows no such thing, so we cannot even finish the build -- as soon as we try to use the newly built JVM (e.g. for running jlink), we will crash and burn. Right, because the build system isn't dealing with the need to explicitly specify the GC to use in such a configuration. That's what I meant about making the build stuff harder. The build system would need to look at the configuration to decide how to accomplish the build. >> src/hotspot/share/gc/shared/genCollectedHeap.cpp >> 197 #if INCLUDE_SERIALGC >> 198 MarkSweep::initialize(); >> 199 #endif >> >> This whole file, and several associated files, are *only* used by >> SerialGC now that CMS has been removed: JDK-8234502. >> > > Then maybe they should be excluded when serial is not included? That would be part of the work involved in resolving JDK-8234502. > Or, if it is determined that Serial GC is essential to hotspot, we should remove the INCLUDE_SERIALGC define and associated framework, since it's just a fake abstraction if it is not actually possible to build without serial GC. I don?t think there is any belief that SerialGC must always be included. That it can?t currently be excluded is an artifact of nobody having the need and the resources to make that possible. >> make/hotspot/lib/JvmFeatures.gmk >> 58 ifeq ($(JVM_VARIANT), custom) >> 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" >> 60 endif >> >> This change looks unrelated to whether serialgc is present or absent. >> If so, it doesn't belong in this changeset at all. >> > > You are correct that this is not strictly about serialgc. When I tested my custom build with only epsilongc, I discovered that jtreg barfed on the version string produced by the custom JVM build. This is a fix that makes sure the VMTYPE always has a value. If you object to me pushing it as part of this fix, I can remove it from here and submit it as a separate issue. (I just didn't think it was worth the hassle.) I understand there is overhead to breaking things into multiple changes, but combining unrelated changes can make archeology and problem or rationale attribution much harder. I looked at this and had no idea what it was for, and it wasn't called out in the RFR or anywhere else. >> make/hotspot/lib/JvmFeatures.gmk >> [removed] >> 154 # If serial is disabled, we cannot use serial as OldGC in parallel >> 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp >> >> This was missed by JDK-8235860, which removed those files. Good find. >> > ... but according to your comment above, that fix also missed to add a bunch of other files that should be excluded..? (If we should keep the ability to disable serial gc, that is?) The comment above was about a different change, the removal of CMS, which is known to be incomplete and have a number of further cleanups and refactorings to do before all vestiges have been removed. This one is about the removal of the Serial-Old variant of ParallelGC, which was thought to be complete, but missed this little snippet. >> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp >> >> As originally written, this test was *only* testing SerialGC. It's not >> obvious that it is actually GC-agnostic and can use the default GC if >> that isn't SerialGC. Certainly some of the naming suggests otherwise. >> Was this tested with all the other configurations? >> > > No, I have not tested all other configurations. I verified that I could build with only g1, only zgc and only epsilongc. I also tested to run tier1 testing, and it "mostly" succeeded, but it still failed on several tests. My quick eyeballing of the situation indicated that the absolute majority (and perhaps all) these failures were related to jtreg tests not properly declaring their dependencies on compiler1 or compiler2. (Remember, on this bare-bones JVM I only had the interpreter, and neither c1 nor c2). > > I *could* of course run a suitable set of testing with say c1 and c2 enabled, and just a single gc enabled, for the set of all gcs != serial gc, but then we're *really* getting into the "not worth it" land. > > It is not clear to me that the test is only run with Serial GC. As far as I can interpret the test framework, this is run with the default collector, which typically is *not* serialgc on our testing framework. If this is only valid for Serial GC, perhaps the test needs to be amended? Looking at this some more, I don't know what this test thinks it's doing, but I suspect it's confused. It's using TEST_VM and TEST_OTHER_VM, both of which create the VM before running the test body. The kinds of things it's doing in that context seem pretty questionable. From kim.barrett at oracle.com Wed Mar 11 01:39:56 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 18:39:56 -0700 (PDT) Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8410db0f-2ce2-5daa-5eb2-24c786df405e@oracle.com> Message-ID: > On Mar 10, 2020, at 7:22 PM, David Holmes wrote: > Also I'm not at all clear what happens if the only GC configured is one of the experimental GCs for which we would normally have to set -XX:+UnlockExperimentalVMOptions ?? Yes, it seems wrong to ever select an experimental GC by default. From kim.barrett at oracle.com Wed Mar 11 01:57:15 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Mar 2020 21:57:15 -0400 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <484dcf75-4891-2b52-1367-0b86281b7671@oracle.com> <8794fd61-c5b9-b7d2-ae49-88c27979abb6@oracle.com> <4d83a54b-1960-2459-6c55-03e261769a09@oracle.com> Message-ID: > On Mar 10, 2020, at 10:51 AM, Magnus Ihse Bursie wrote: >>> Nits: >>> >>> *) src/hotspot/share/gc/shared/gcConfig.cpp changes are a bit strange: >>> - Epsilon should not ever be selected by ergonomics >>> - Why ZGC is selected before Shenandoah? [Oh, what a can of worms that one is ;)] >> >> This fallback list is clearly just meant to allow for any combination of GCs being compiled into the JVM. If the only one you picked was epsilon, then what other default would you expect? It's last in the list so any other GC will still be prioritized before it if present. For the same reason, the order of ZGC and Shenandoah is irrelevant and could just as well be the other way. It will never have any real consequence. This code is only there to keep things from falling apart when a non standard combination of jvm features is picked. > Exactly. For good measure, I can surely put Shenandoah before ZGC. :) Whichever one is placed first will likely annoy the folks behind the competing second. There?s no way to win this one. > The idea behind the added defaults as fallback is just to allow the JVM to even start if Serial GC is not present. If you configure with SerialGC (which, as you note, is the typical case), this change will not affect anything. Without this, it is not even possible to complete the build without SerialGC, since jlink cannot run. The is_server_class_machine() test is intended to filter out collectors that (probably) don?t make sense to run on ?small? machines. (Admittedly, it?s not so easy to buy a computer that doesn?t qualify for is_server_class_machine() anymore, outside of the embedded space, and even there?) But we let one insist by allowing the default to be overridden by an explicit selection. From shade at redhat.com Wed Mar 11 12:23:50 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Mar 2020 13:23:50 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles Message-ID: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> RFR: https://bugs.openjdk.java.net/browse/JDK-8240868 See the rationale in description. Webrev: https://cr.openjdk.java.net/~shade/8240868/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 11 12:37:12 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 13:37:12 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> Message-ID: <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> Hi Aleksey, Very nice! I see you haven't touched conc-mark. While updating-on-mark is still used by full-GC, there should be a couple of paths that are unused now (e.g. in the init-mark parts), do you intend to (carefully) remove them in a follow-up? Also, the block here: http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 is not needed anymore, either. It's only there to conclude the GC cycle, in the case where the cycle officially (and awkwardly) ends at final-mark. (We'll probably find more orphaned little blocks related to this in the future.) Other than that, it's good. Roman > RFR: > https://bugs.openjdk.java.net/browse/JDK-8240868 > > See the rationale in description. > > Webrev: > https://cr.openjdk.java.net/~shade/8240868/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From shade at redhat.com Wed Mar 11 12:37:44 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Mar 2020 13:37:44 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> Message-ID: <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> On 3/11/20 1:37 PM, Roman Kennke wrote: > I see you haven't touched conc-mark. While updating-on-mark is still > used by full-GC, there should be a couple of paths that are unused now > (e.g. in the init-mark parts), do you intend to (carefully) remove them > in a follow-up? Yes, that is the plan: go through all uses of has_forwarded_objects in marking code and see if is used somewhere else. If not, those should be removed. > Also, the block here: > > http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 > > is not needed anymore, either. It's only there to conclude the GC cycle, > in the case where the cycle officially (and awkwardly) ends at final-mark. Yes, that is one of the follow-ups. -- Thanks, -Aleksey From zgu at redhat.com Wed Mar 11 12:45:13 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 08:45:13 -0400 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> Message-ID: <3dbfc339-47e3-d39b-55c5-bbd2073662b2@redhat.com> Yes, I like it. Looks good to me. Thanks, -Zhengyu On 3/11/20 8:23 AM, Aleksey Shipilev wrote: > RFR: > https://bugs.openjdk.java.net/browse/JDK-8240868 > > See the rationale in description. > > Webrev: > https://cr.openjdk.java.net/~shade/8240868/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From rkennke at redhat.com Wed Mar 11 12:53:19 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 13:53:19 +0100 Subject: RFR (S) 8240868: Shenandoah: remove CM-with-UR piggybacking cycles In-Reply-To: <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> References: <6c036105-0b9c-e5b8-5990-6c45ba971894@redhat.com> <178a2fec-eedc-3601-b4bd-d4e80e9c6dfb@redhat.com> <1ddf377b-8101-af3e-b5c2-7f199258fe87@redhat.com> Message-ID: On 3/11/20 1:37 PM, Aleksey Shipilev wrote: > On 3/11/20 1:37 PM, Roman Kennke wrote: >> I see you haven't touched conc-mark. While updating-on-mark is still >> used by full-GC, there should be a couple of paths that are unused now >> (e.g. in the init-mark parts), do you intend to (carefully) remove them >> in a follow-up? > > Yes, that is the plan: go through all uses of has_forwarded_objects in marking code and see if is > used somewhere else. If not, those should be removed. > >> Also, the block here: >> >> http://hg.openjdk.java.net/jdk/jdk/file/e50512f91026/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#l1467 >> >> is not needed anymore, either. It's only there to conclude the GC cycle, >> in the case where the cycle officially (and awkwardly) ends at final-mark. > > Yes, that is one of the follow-ups. Ok, good then. Thanks, Roman From rkennke at redhat.com Wed Mar 11 19:54:17 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Mar 2020 20:54:17 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers Message-ID: The strong invariant gives us an opportunity to short-cut arraycopy-barriers: - if the src object is beyond the safe-iteration limit, e.g. has been allocated since evac-start, then it can not have any from-space references and thus does not require updating. - likewise, if the dst object is beyond TAMS, e.g. has been allocated since mark-start, then it can only have references that must have been reachable otherwise and thus don't require enqueueing in SATB. Short-cutting on those condition cuts out 80-90% of arraycopy slowpaths. It also brings in the closing of update-watermark after updating one region is finished, originally proposed in "8240872: Shenandoah: Avoid updating new regions from start of evacuation", but now with a fence to ensure that preceding updates of regions are indeed visible to threads before they see the watermark going down. Bug: https://bugs.openjdk.java.net/browse/JDK-8240873 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ Testing: hotspot_gc_shenandoah, specjbb2015, some specjvm workloads Can I please get a review? Thanks, Roman From zgu at redhat.com Wed Mar 11 21:43:16 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 17:43:16 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> Message-ID: <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Revised based on offline discussions. Piggyback on stack code root rescanning to SATB draining task. Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ Reran tests: hotspot_gc_shenandoah tools/javac Thanks, -Zhengyu On 3/4/20 6:06 PM, Zhengyu Gu wrote: > Traversal GC has the same issue, also need to remark on stack code roots > in final traversal. > > @@ -263,11 +263,12 @@ > ???? if (!_heap->is_degenerated_gc_in_progress()) { > ?????? ShenandoahTraversalRootsClosure roots_cl(q, rp); > ?????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); > ?????? if (unload_classes) { > ???????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); > -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, > NULL, &tc); > +??????? MarkingCodeBlobClosure code_cl(&roots_cl, > CodeBlobToOopClosure::FixRelocations); > +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, > &code_cl, &tc); > ?????? } else { > ???????? CLDToOopClosure cld_cl(&roots_cl, > ClassLoaderData::_claim_strong); > ???????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); > ?????? } > ???? } else { > > > Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ > > Thank, > > -Zhengyu > > On 2/25/20 12:13 PM, Zhengyu Gu wrote: >> Shenandoah encounters a few test failures with tools/javac. Verifier >> catches unmarked oops in nmethod's metadata during root evacuation in >> final mark phase. >> >> The problem is that, Shenandoah marks on stack nmethods in init mark >> pause, but it does not mark nmethod's metadata during concurrent mark >> phase, when new nmethod is about to be executed. >> >> The solution: >> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the >> nmethod is about to be executed, when nmethod entry barrier is supported. >> >> 2) Remark on stack nmethod's metadata at final mark pause. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >> >> Test: >> ?? hotspot_gc_shenandoah (fastdebug and release) >> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >> release) >> >> Thanks, >> >> -Zhengyu From adityam at microsoft.com Thu Mar 12 01:40:32 2020 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 12 Mar 2020 01:40:32 +0000 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8231668 Webrev: https://cr.openjdk.java.net/~adityam/8231668/ This removes all the ForceDynamicNumberOfGCThreads-related code and the test cases using it. Note: this is my first patch since getting Author status, so please feel free to let me know if there's anything wrong with how I created the webrev. Other friendly folks have been doing that for me until now :). Thanks, Aditya From zgu at redhat.com Thu Mar 12 01:44:08 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Mar 2020 21:44:08 -0400 Subject: [15] RFR (T) 8240915: Shenandoah: Remove unused fields in init mark tasks Message-ID: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> Please review this trivial change that removes unused fields in ShenandoahInitTraversalCollectionTask and ShenandoahInitMarkRootsTask. Bug: https://bugs.openjdk.java.net/browse/JDK-8240915 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240915/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From shade at redhat.com Thu Mar 12 05:45:31 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 06:45:31 +0100 Subject: [15] RFR (T) 8240915: Shenandoah: Remove unused fields in init mark tasks In-Reply-To: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> References: <30a21e14-8807-0708-6442-6776d1ec1a55@redhat.com> Message-ID: On 3/12/20 2:44 AM, Zhengyu Gu wrote: > Please review this trivial change that removes unused fields in > ShenandoahInitTraversalCollectionTask and ShenandoahInitMarkRootsTask. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240915 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240915/webrev.00/ Looks good to me. I wonder (idly) when did they stopped being used. -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 06:09:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 07:09:28 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: Message-ID: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> On 3/12/20 2:40 AM, Aditya Mandaleeka wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8231668 > > Webrev: > https://cr.openjdk.java.net/~adityam/8231668/ This looks good to me. Stylistic nits: *) Conditions like this: 865 if (!UseDynamicNumberOfGCThreads || 866 !FLAG_IS_DEFAULT(ConcGCThreads)) { ...can now be written like: if (!UseDynamicNumberOfGCThreads || !FLAG_IS_DEFAULT(ConcGCThreads)) { *) Please update copyrights to mention 2020. For example, in workerPolicy.cpp: Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. > Note: this is my first patch since getting Author status, so please feel free to let me know if there's > anything wrong with how I created the webrev. Other friendly folks have been doing that for me > until now :). Many of us use "mq" extension to stash the patches. The upside with webrevs would be that you can add the changeset description (hg qrefresh -e) right to the patch, and then webrev would pick it up. It would also generate the changeset itself, so sponsors would just download it and push on your behalf. Metadata for this change is something like: 8231668: Remove ForceDynamicNumberOfGCThreads Reviewed-by: XXX (list of reviewers from census) Contributed-by: Aditya Mandaleeka -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 08:31:30 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 09:31:30 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References: Message-ID: On 3/11/20 8:54 PM, Roman Kennke wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8240873 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? *) This bit is incorrect, should be set_update_watermark: 2426 r->set_concurrent_iteration_safe_limit(r->bottom()); -- Thanks, -Aleksey From sgehwolf at redhat.com Thu Mar 12 09:30:56 2020 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Thu, 12 Mar 2020 10:30:56 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> Message-ID: On Thu, 2020-03-12 at 07:09 +0100, Aleksey Shipilev wrote: > Metadata for this change is something like: > > 8231668: Remove ForceDynamicNumberOfGCThreads > Reviewed-by: XXX (list of reviewers from census) > Contributed-by: Aditya Mandaleeka For authors, 'Contributed-by:' line would not be necessary, no? They could just use "hg commit -u ". That's my understanding anyhow. Thanks, Severin From shade at redhat.com Thu Mar 12 09:43:12 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 10:43:12 +0100 Subject: RFR (S) 8240948: Shenandoah: cleanup not-forwarded-objects paths after JDK-8240868 Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8240948 Unfortunately, not much code can be eliminated from conc-mark, because Full GC (and Traversal?) share some of that code. Webrev: https://cr.openjdk.java.net/~shade/8240948/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 12 11:04:31 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 12:04:31 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References: Message-ID: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> On 3/12/20 9:31 AM, Aleksey Shipilev wrote: > On 3/11/20 8:54 PM, Roman Kennke wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8240873 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.00/ > > *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? > Hmm ok. This requires making the field. I am not sure if the cast in get_update_watermark() is ok? > *) This bit is incorrect, should be set_update_watermark: > > 2426 r->set_concurrent_iteration_safe_limit(r->bottom()); Hoops. Corrected. http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.02/ WDYT? From shade at redhat.com Thu Mar 12 11:04:42 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 12:04:42 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> References: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> Message-ID: On 3/12/20 12:04 PM, Roman Kennke wrote: >> *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? > > Hmm ok. This requires making the field. I am not sure if the cast in > get_update_watermark() is ok? I don't quite understand why cast is needed. There are already _critical_pins and _live_data fields that are atomic, why can't we do the same? > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.02/ Looks fine, modulo nit above. -- Thanks, -Aleksey From thomas.schatzl at oracle.com Thu Mar 12 13:00:51 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 12 Mar 2020 06:00:51 -0700 (PDT) Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: Message-ID: <7d294441-3069-55e7-aa11-b6b20699a24a@oracle.com> Hi, On 12.03.20 02:40, Aditya Mandaleeka wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8231668 > > Webrev: > https://cr.openjdk.java.net/~adityam/8231668/ > > This removes all the ForceDynamicNumberOfGCThreads-related code and the test cases using it. > > Note: this is my first patch since getting Author status, so please feel free to let me know if there's > anything wrong with how I created the webrev. Other friendly folks have been doing that for me > until now :). in addition to what Aleksey said: - a comment in TestDynamicNumberOfGCThreads refers to a non-existent option "TraceDynamicGCThreads" - I think that that entire test does not test a lot as it only checks whether that log message is printed, but it does not check whether there is actually a dynamic change in number of gc threads over time. I think it can be removed. Feel free to file a separate CR for improving it or creating a new one, depending on whether you remove it or not. Thanks, Thomas From ivan.walulya at oracle.com Thu Mar 12 13:12:28 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 12 Mar 2020 14:12:28 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> Message-ID: <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ > On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: > > Hi, > > On 05.03.20 11:33, Ivan Walulya wrote: >> Hi all, >> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >> //Ivan > > some minor (imo) comments to start a discussion: > > - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. > > - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. > > - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. > > So overall, I am good with the change but asking for opinions :) > > Thanks, > Thomas From zgu at redhat.com Thu Mar 12 13:23:22 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 09:23:22 -0400 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner Message-ID: Please review this small enhancement, that avoids scanning thread's code roots if we scan all code blobs anyway. Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) Thanks, -Zhengyu From rkennke at redhat.com Thu Mar 12 14:12:01 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:12:01 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: References: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> Message-ID: <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> On 3/12/20 12:04 PM, Aleksey Shipilev wrote: > On 3/12/20 12:04 PM, Roman Kennke wrote: >>> *) Wouldn't it make more sense to do acquire/release in (get|set)_update_watermark? >> >> Hmm ok. This requires making the field. I am not sure if the cast in >> get_update_watermark() is ok? > > I don't quite understand why cast is needed. There are already _critical_pins and _live_data fields > that are atomic, why can't we do the same? > Turns out that: volatile HeapWord* _update_watermark; is not the same as: HeapWord* volatile _update_watermark; The former means 'a pointer to a volatile HeapWord', the latter 'a volatile pointer to a HeapWord'. We need the latter. Testing showed an occasional failure caused by piggy-backing updaterefs on marking: it would skip updating when taking the marking-shortcut. While it is not really relevant anymore, I changed the conditions a bit to not blindly return in the arraycopy-pre-barrier, but do the check for updating independently. Overall a good example why it was a good move to get rid of the piggy-backing. It causes more maintenance for no real benefit. http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.03/ Passes all tests in hotspot_gc_shenandoah Good now? Roman From rkennke at redhat.com Thu Mar 12 14:14:02 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:14:02 +0100 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: References: Message-ID: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> It's doing the same in both branches, or what am I missing? Roman On 3/12/20 2:23 PM, Zhengyu Gu wrote: > Please review this small enhancement, that avoids scanning thread's code > roots if we scan all code blobs anyway. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ > > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > > Thanks, > > -Zhengyu > From rkennke at redhat.com Thu Mar 12 14:20:56 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:20:56 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Message-ID: Hi Zhengyu, in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: + } else if (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { + // Disarm nmethods that armed for concurrent mark. + // On normal code path (non-empty Cset), it depends on update_roots() to + // disarm nmethods in degenerated GC. + ShenandoahCodeRoots::disarm_nmethods(); beware that the update_roots() is only called at the end of update_refs phase. The same call at end of marking is orphaned since removal of piggy-backed marking. Otherwise looks good. Thanks, Roman On 3/11/20 10:43 PM, Zhengyu Gu wrote: > Revised based on offline discussions. > > Piggyback on stack code root rescanning to SATB draining task. > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ > > Reran tests: > ? hotspot_gc_shenandoah > ? tools/javac > > Thanks, > > -Zhengyu > > On 3/4/20 6:06 PM, Zhengyu Gu wrote: >> Traversal GC has the same issue, also need to remark on stack code >> roots in final traversal. >> >> @@ -263,11 +263,12 @@ >> ????? if (!_heap->is_degenerated_gc_in_progress()) { >> ??????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >> ??????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >> ??????? if (unload_classes) { >> ????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >> NULL, &tc); >> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >> CodeBlobToOopClosure::FixRelocations); >> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >> &code_cl, &tc); >> ??????? } else { >> ????????? CLDToOopClosure cld_cl(&roots_cl, >> ClassLoaderData::_claim_strong); >> ????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >> ??????? } >> ????? } else { >> >> >> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >> >> Thank, >> >> -Zhengyu >> >> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>> Shenandoah encounters a few test failures with tools/javac. Verifier >>> catches unmarked oops in nmethod's metadata during root evacuation in >>> final mark phase. >>> >>> The problem is that, Shenandoah marks on stack nmethods in init mark >>> pause, but it does not mark nmethod's metadata during concurrent mark >>> phase, when new nmethod is about to be executed. >>> >>> The solution: >>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>> the nmethod is about to be executed, when nmethod entry barrier is >>> supported. >>> >>> 2) Remark on stack nmethod's metadata at final mark pause. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) >>> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>> release) >>> >>> Thanks, >>> >>> -Zhengyu > From rkennke at redhat.com Thu Mar 12 14:24:46 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 15:24:46 +0100 Subject: RFR (S) 8240948: Shenandoah: cleanup not-forwarded-objects paths after JDK-8240868 In-Reply-To: References: Message-ID: <6f5c904e-06c5-8b87-857a-db6d8fd4e3c0@redhat.com> Looks good. Let's do the rest carefully. Full-GC requires updating refs by traversal because we may not be able to parse the heap sequentially. Traversal should be mostly-ok because it has its own set of closures to deal with updating refs. Thanks, Roman On 3/12/20 10:43 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8240948 > > Unfortunately, not much code can be eliminated from conc-mark, because Full GC (and Traversal?) > share some of that code. > > Webrev: > https://cr.openjdk.java.net/~shade/8240948/webrev.01/ > > Testing: hotspot_gc_shenandoah > From zgu at redhat.com Thu Mar 12 15:01:17 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 11:01:17 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> Message-ID: <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> Hi Roman, On 3/12/20 10:20 AM, Roman Kennke wrote: > Hi Zhengyu, > > in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: > + } else if > (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { > + // Disarm nmethods that armed for concurrent mark. > + // On normal code path (non-empty Cset), it depends on > update_roots() to > + // disarm nmethods in degenerated GC. > + ShenandoahCodeRoots::disarm_nmethods(); > > beware that the update_roots() is only called at the end of update_refs > phase. The same call at end of marking is orphaned since removal of > piggy-backed marking. I think it is fine, successful degenerated GC cycle should always execute update_refs, no? Thanks, -Zhengyu > > Otherwise looks good. > > Thanks, > Roman > > > On 3/11/20 10:43 PM, Zhengyu Gu wrote: >> Revised based on offline discussions. >> >> Piggyback on stack code root rescanning to SATB draining task. >> >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >> >> Reran tests: >> ? hotspot_gc_shenandoah >> ? tools/javac >> >> Thanks, >> >> -Zhengyu >> >> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>> Traversal GC has the same issue, also need to remark on stack code >>> roots in final traversal. >>> >>> @@ -263,11 +263,12 @@ >>> ????? if (!_heap->is_degenerated_gc_in_progress()) { >>> ??????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>> ??????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>> ??????? if (unload_classes) { >>> ????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>> NULL, &tc); >>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>> CodeBlobToOopClosure::FixRelocations); >>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>> &code_cl, &tc); >>> ??????? } else { >>> ????????? CLDToOopClosure cld_cl(&roots_cl, >>> ClassLoaderData::_claim_strong); >>> ????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>> ??????? } >>> ????? } else { >>> >>> >>> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>> >>> Thank, >>> >>> -Zhengyu >>> >>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>> catches unmarked oops in nmethod's metadata during root evacuation in >>>> final mark phase. >>>> >>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>> pause, but it does not mark nmethod's metadata during concurrent mark >>>> phase, when new nmethod is about to be executed. >>>> >>>> The solution: >>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>> the nmethod is about to be executed, when nmethod entry barrier is >>>> supported. >>>> >>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>> >>>> Test: >>>> ?? hotspot_gc_shenandoah (fastdebug and release) >>>> ?? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>>> release) >>>> >>>> Thanks, >>>> >>>> -Zhengyu >> > From zgu at redhat.com Thu Mar 12 15:34:21 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 12 Mar 2020 11:34:21 -0400 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> References: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> Message-ID: <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> Oops, copy/paste error, updated: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.01/ Reran hotspot_gc_shenandaoh tests Thanks, -Zhengyu On 3/12/20 10:14 AM, Roman Kennke wrote: > It's doing the same in both branches, or what am I missing? > > Roman > > On 3/12/20 2:23 PM, Zhengyu Gu wrote: >> Please review this small enhancement, that avoids scanning thread's code >> roots if we scan all code blobs anyway. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ >> >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) >> >> Thanks, >> >> -Zhengyu >> > From shade at redhat.com Thu Mar 12 16:14:04 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 17:14:04 +0100 Subject: RFR: 8240873: Shenandoah: Short-cut arraycopy barriers In-Reply-To: <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> References: <58ecb1aa-ff0b-d833-83e2-cbeeeb7582bb@redhat.com> <54c9b492-a1b9-4c07-7b1f-a432e989c9e0@redhat.com> Message-ID: On 3/12/20 3:12 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/JDK-8240873/webrev.03/ OK, good. -- Thanks, -Aleksey From shade at redhat.com Thu Mar 12 16:27:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 12 Mar 2020 17:27:23 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> Message-ID: <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> On 3/12/20 10:30 AM, Severin Gehwolf wrote: > On Thu, 2020-03-12 at 07:09 +0100, Aleksey Shipilev wrote: >> Metadata for this change is something like: >> >> 8231668: Remove ForceDynamicNumberOfGCThreads >> Reviewed-by: XXX (list of reviewers from census) >> Contributed-by: Aditya Mandaleeka > > For authors, 'Contributed-by:' line would not be necessary, no? They > could just use "hg commit -u ". That's my understanding > anyhow. I believe Contributed-by is cleaner and captures the reality better. If you look in the repo history, there are plenty of Contributed-by lines mentioning those who have author status. -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 12 16:43:05 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 17:43:05 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> Message-ID: <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> >> Hi Zhengyu, >> >> in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: >> +????? } else if >> (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { >> +??????? // Disarm nmethods that armed for concurrent mark. >> +??????? // On normal code path (non-empty Cset), it depends on >> update_roots() to >> +??????? // disarm nmethods in degenerated GC. >> +??????? ShenandoahCodeRoots::disarm_nmethods(); >> >> beware that the update_roots() is only called at the end of update_refs >> phase. The same call at end of marking is orphaned since removal of >> piggy-backed marking. > > I think it is fine, successful degenerated GC cycle should always > execute update_refs, no? Ok. I was only worried because the comment seems to imply it relies to update_roots() at the end of mark. Aleksey's patch is removing that. If update_roots() at the end of update_refs is good too, then fine. Thanks, Roman > Thanks, > > -Zhengyu > >> >> Otherwise looks good. >> >> Thanks, >> Roman >> >> >> On 3/11/20 10:43 PM, Zhengyu Gu wrote: >>> Revised based on offline discussions. >>> >>> Piggyback on stack code root rescanning to SATB draining task. >>> >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >>> >>> Reran tests: >>> ?? hotspot_gc_shenandoah >>> ?? tools/javac >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>>> Traversal GC has the same issue, also need to remark on stack code >>>> roots in final traversal. >>>> >>>> @@ -263,11 +263,12 @@ >>>> ?????? if (!_heap->is_degenerated_gc_in_progress()) { >>>> ???????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>>> ???????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>>> ???????? if (unload_classes) { >>>> ?????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>> NULL, &tc); >>>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>>> CodeBlobToOopClosure::FixRelocations); >>>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>> &code_cl, &tc); >>>> ???????? } else { >>>> ?????????? CLDToOopClosure cld_cl(&roots_cl, >>>> ClassLoaderData::_claim_strong); >>>> ?????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>>> ???????? } >>>> ?????? } else { >>>> >>>> >>>> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>>> >>>> Thank, >>>> >>>> -Zhengyu >>>> >>>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>>> catches unmarked oops in nmethod's metadata during root evacuation in >>>>> final mark phase. >>>>> >>>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>>> pause, but it does not mark nmethod's metadata during concurrent mark >>>>> phase, when new nmethod is about to be executed. >>>>> >>>>> The solution: >>>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>>> the nmethod is about to be executed, when nmethod entry barrier is >>>>> supported. >>>>> >>>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>>> >>>>> Test: >>>>> ??? hotspot_gc_shenandoah (fastdebug and release) >>>>> ??? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>>>> release) >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu >>> >> > From rkennke at redhat.com Thu Mar 12 16:43:59 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 12 Mar 2020 17:43:59 +0100 Subject: [15] RFR 8240917: Shenandoah: Avoid scanning thread code roots twice in all root scanner In-Reply-To: <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> References: <1be5f430-d7ae-0615-744a-60e8b1fea05e@redhat.com> <5f1056c2-1639-8753-473e-d383b12a9cbd@redhat.com> Message-ID: <67950961-4f10-a00e-84b1-6c3ccbe99b31@redhat.com> Ok, makes more sense now. Looks good! Thank you! Roman > Oops, copy/paste error, updated: > > http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.01/ > > Reran hotspot_gc_shenandaoh tests > > Thanks, > > -Zhengyu > > On 3/12/20 10:14 AM, Roman Kennke wrote: >> It's doing the same in both branches, or what am I missing? >> >> Roman >> >> On 3/12/20 2:23 PM, Zhengyu Gu wrote: >>> Please review this small enhancement, that avoids scanning thread's code >>> roots if we scan all code blobs anyway. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240917 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8240917/webrev.00/ >>> >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) >>> >>> Thanks, >>> >>> -Zhengyu >>> >> > From stefan.johansson at oracle.com Thu Mar 12 17:58:09 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 12 Mar 2020 18:58:09 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: Hi Ivan, > 12 mars 2020 kl. 14:12 skrev Ivan Walulya : > > Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ Looks good, and I agree on trace-level being good, just one minor thing you could fix before pushing: 65 recent_gc_overhead,_g1h->capacity()); Please add a space after the comma. I can do the push iif you don?t already have a sponsor, given you get the second review. Thanks, Stefan > >> On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: >> >> Hi, >> >> On 05.03.20 11:33, Ivan Walulya wrote: >>> Hi all, >>> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >>> //Ivan >> >> some minor (imo) comments to start a discussion: >> >> - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. >> >> - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. >> >> - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. >> >> So overall, I am good with the change but asking for opinions :) >> >> Thanks, >> Thomas > From ivan.walulya at oracle.com Thu Mar 12 19:39:09 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Thu, 12 Mar 2020 20:39:09 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: <2CA8C80B-AA38-4AB7-A165-47AC7EB6AE37@oracle.com> Thanks Stefan > On 12 Mar 2020, at 18:58, Stefan Johansson wrote: > > Hi Ivan, > >> 12 mars 2020 kl. 14:12 skrev Ivan Walulya : >> >> Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ > Looks good, and I agree on trace-level being good, just one minor thing you could fix before pushing: > > 65 recent_gc_overhead,_g1h->capacity()); > Please add a space after the comma. > > I can do the push iif you don?t already have a sponsor, given you get the second review. Will make the changes and send to you for pushing after getting the second review. > > Thanks, > Stefan > > >> >>> On 10 Mar 2020, at 10:16, Thomas Schatzl wrote: >>> >>> Hi, >>> >>> On 05.03.20 11:33, Ivan Walulya wrote: >>>> Hi all, >>>> Please review a small modification for G1HeapSizingPolicy to return without computing expansion_amount when heap is already at full capacity. >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8240591 >>>> Webrev: http://cr.openjdk.java.net/~iwalulya/8240591/00/ >>>> //Ivan >>> >>> some minor (imo) comments to start a discussion: >>> >>> - I (weakly) suggest to remove the unnecessary assert(GCTimeRatio > 0) or put it at the beginning of the method. The initialization already guarantees that it is > 0. Probably best to move to the very top of the method. >>> >>> - I think I understand why the code clears the ratio check data, presumably to avoid the "windup" of the ratio check data being stuck into the maximum, and taking some time to wind down. I think this is a good idea, but I would prefer to just unconditionally clear the data - it does not seem time consuming, and makes the code a bit smaller. >>> >>> - undecided on the log message: while it is informative, now you get a log message even if nothing changed. Maybe make it trace level? Others might chime in here with their opinions. Also given that often people set -Xms==-Xmx, this one seems to be a bit chatty at this level. I can see the point of it though. >>> >>> So overall, I am good with the change but asking for opinions :) >>> >>> Thanks, >>> Thomas >> > From adityam at microsoft.com Thu Mar 12 20:45:41 2020 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 12 Mar 2020 20:45:41 +0000 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> Message-ID: Thanks Aleksey and Thomas for reviewing. I've updated the patch with the feedback. I left the TestDynamicNumberofGCThreads test in place but fixed the comment. Seems like it's worth revisiting that test to make it more useful as a separate issue. Aleksey, I am coming from the Git world and still getting familiar with the workflow here. I hadn't heard of the MqExtension until your mail, but I tried it out. To be honest, I was quite confused about how to use it in conjunction with the webrev script even after reading some documentation. I ended up with a webrev which appears to have the right code diff, but I'm not sure if all the metadata is in the form you'd expect. I'd appreciate it if you could verify that. Updated webrev is at: https://cr.openjdk.java.net/~adityam/8231668/webrev.01/ Severin Gehwolf wrote: > For authors, 'Contributed-by:' line would not be necessary, no? They > could just use "hg commit -u ". That's my understanding > anyhow. This matches my understanding as well from reading http://openjdk.java.net/projects/#project-author. That said, I don't really have a strong preference on it, so whatever you all prefer is fine with me. Just let me know what I need to do! Thanks, Aditya From mark.reinhold at oracle.com Thu Mar 12 23:49:01 2020 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Fri, 13 Mar 2020 00:49:01 +0100 (CET) Subject: New candidate JEP: 376: ZGC: Concurrent Thread-Stack Processing Message-ID: <20200312234901.4EC2B319DB4@eggemoggin.niobe.net> https://openjdk.java.net/jeps/376 - Mark From shade at redhat.com Fri Mar 13 08:02:36 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 13 Mar 2020 09:02:36 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> Message-ID: <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> On 3/12/20 9:45 PM, Aditya Mandaleeka wrote: > Updated webrev is at: https://cr.openjdk.java.net/~adityam/8231668/webrev.01/ Looks fine to me. Somebody from G1 should also ack this. Thomas? -- Thanks, -Aleksey From thomas.schatzl at oracle.com Fri Mar 13 09:38:51 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 13 Mar 2020 10:38:51 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> Message-ID: Hi, On 13.03.20 09:02, Aleksey Shipilev wrote: > On 3/12/20 9:45 PM, Aditya Mandaleeka wrote: >> Updated webrev is at: https://cr.openjdk.java.net/~adityam/8231668/webrev.01/ > > Looks fine to me. > > Somebody from G1 should also ack this. Thomas? > looks good. Please fix the copyright dates for the tests as well since you did so for other files too already. No need for a re-review. Aleksey, will you sponsor the patch, or should I do it? Thanks, Thomas From shade at redhat.com Fri Mar 13 09:44:47 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 13 Mar 2020 10:44:47 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> Message-ID: On 3/13/20 10:38 AM, Thomas Schatzl wrote: > looks good. Please fix the copyright dates for the tests as well > since you did so for other files too already. No need for a re-review. +1, I would update those before pushing. Aditya, you don't need to do anything else. > Aleksey, will you sponsor the patch, or should I do it? Sure, I will sponsor it. -- Thanks, -Aleksey From thomas.schatzl at oracle.com Fri Mar 13 10:28:57 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 13 Mar 2020 11:28:57 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: Hi, On 12.03.20 14:12, Ivan Walulya wrote: > Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ looks good sans the space Stefan mentioned, for which I do not need a re-review. Thomas From ivan.walulya at oracle.com Fri Mar 13 10:30:00 2020 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Fri, 13 Mar 2020 11:30:00 +0100 Subject: RFR(XS): 8240591: G1HeapSizingPolicy attempts to compute expansion_amount even when at full capacity In-Reply-To: References: <197FCCDD-2E5D-4A63-B796-28908B18DB0E@oracle.com> <2c22164c-b459-af44-c7c1-15f1eacb4d52@oracle.com> <9F4A4087-B013-4AAE-96E8-C15C46B25AD7@oracle.com> Message-ID: <9095AC0C-6327-409C-B2BD-8F030F6DA124@oracle.com> Thanks Thomas! > On 13 Mar 2020, at 11:28, Thomas Schatzl wrote: > > Hi, > > On 12.03.20 14:12, Ivan Walulya wrote: >> Please find updated webrev: http://cr.openjdk.java.net/~iwalulya/8240591/01/ > > looks good sans the space Stefan mentioned, for which I do not need a re-review. > > Thomas From shade at redhat.com Fri Mar 13 12:32:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 13 Mar 2020 13:32:28 +0100 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> Message-ID: <65966e2e-51c1-f8af-bc8c-650ea7cd27bf@redhat.com> On 3/13/20 10:44 AM, Aleksey Shipilev wrote: > On 3/13/20 10:38 AM, Thomas Schatzl wrote: >> looks good. Please fix the copyright dates for the tests as well >> since you did so for other files too already. No need for a re-review. > > +1, I would update those before pushing. Aditya, you don't need to do anything else. > >> Aleksey, will you sponsor the patch, or should I do it? > > Sure, I will sponsor it. Copyright lines updated; jdk-submit is clean, hotspot_gc_shenandoah is clean. Pushed: https://hg.openjdk.java.net/jdk/jdk/rev/367b1f73904c jcheck initially rejected the push, because: remote: [jcheck b8f3bac16fc8 2019-03-12 09:27:17 -0700] remote: remote: > Changeset: 58392:8ac108cd32af remote: > Author: shade remote: > Date: 2020-03-13 13:22 remote: > remote: > 8231668: Remove ForceDynamicNumberOfGCThreads remote: > Reviewed-by: shade, tschatzl remote: > Contributed-by: Aditya Mandaleeka remote: remote: Invalid contributor attribution The email should have been proper one, with @. -- Thanks, -Aleksey From zgu at redhat.com Fri Mar 13 12:56:10 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 13 Mar 2020 08:56:10 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> Message-ID: Overnight tests showed problems with this patch. So, I would like to withdraw this RFR. Thanks, -Zhengyu On 3/12/20 12:43 PM, Roman Kennke wrote: >>> Hi Zhengyu, >>> >>> in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: >>> +????? } else if >>> (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { >>> +??????? // Disarm nmethods that armed for concurrent mark. >>> +??????? // On normal code path (non-empty Cset), it depends on >>> update_roots() to >>> +??????? // disarm nmethods in degenerated GC. >>> +??????? ShenandoahCodeRoots::disarm_nmethods(); >>> >>> beware that the update_roots() is only called at the end of update_refs >>> phase. The same call at end of marking is orphaned since removal of >>> piggy-backed marking. >> >> I think it is fine, successful degenerated GC cycle should always >> execute update_refs, no? > > Ok. I was only worried because the comment seems to imply it relies to > update_roots() at the end of mark. Aleksey's patch is removing that. If > update_roots() at the end of update_refs is good too, then fine. > > Thanks, > Roman > >> Thanks, >> >> -Zhengyu >> >>> >>> Otherwise looks good. >>> >>> Thanks, >>> Roman >>> >>> >>> On 3/11/20 10:43 PM, Zhengyu Gu wrote: >>>> Revised based on offline discussions. >>>> >>>> Piggyback on stack code root rescanning to SATB draining task. >>>> >>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >>>> >>>> Reran tests: >>>> ?? hotspot_gc_shenandoah >>>> ?? tools/javac >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>>> >>>> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>>>> Traversal GC has the same issue, also need to remark on stack code >>>>> roots in final traversal. >>>>> >>>>> @@ -263,11 +263,12 @@ >>>>> ?????? if (!_heap->is_degenerated_gc_in_progress()) { >>>>> ???????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>>>> ???????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>>>> ???????? if (unload_classes) { >>>>> ?????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>>>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>>> NULL, &tc); >>>>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>>>> CodeBlobToOopClosure::FixRelocations); >>>>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>>> &code_cl, &tc); >>>>> ???????? } else { >>>>> ?????????? CLDToOopClosure cld_cl(&roots_cl, >>>>> ClassLoaderData::_claim_strong); >>>>> ?????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>>>> ???????? } >>>>> ?????? } else { >>>>> >>>>> >>>>> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>>>> >>>>> Thank, >>>>> >>>>> -Zhengyu >>>>> >>>>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>>>> catches unmarked oops in nmethod's metadata during root evacuation in >>>>>> final mark phase. >>>>>> >>>>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>>>> pause, but it does not mark nmethod's metadata during concurrent mark >>>>>> phase, when new nmethod is about to be executed. >>>>>> >>>>>> The solution: >>>>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>>>> the nmethod is about to be executed, when nmethod entry barrier is >>>>>> supported. >>>>>> >>>>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>>>> >>>>>> Test: >>>>>> ??? hotspot_gc_shenandoah (fastdebug and release) >>>>>> ??? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and >>>>>> release) >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Zhengyu >>>> >>> >> > From erik.osterlund at oracle.com Fri Mar 13 13:07:22 2020 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Fri, 13 Mar 2020 14:07:22 +0100 Subject: RFR: 8240714: ZGC: TestSmallHeap.java failed due to OutOfMemoryError In-Reply-To: <74b2b7a0-fa16-7476-809a-fc550b4827d0@oracle.com> References: <74b2b7a0-fa16-7476-809a-fc550b4827d0@oracle.com> Message-ID: <049E2DF9-F1FA-46A4-AA03-5C0DBE20CFBB@oracle.com> Hi Per, Looks good. Thanks, /Erik > On 10 Mar 2020, at 18:20, Per Liden wrote: > > ?The gc/z/TestSmallHeap.java test failed once due to OutOfMemoryError. When using a 8M heap, this test is fairly sensitive in the sense that the heap will be very crowded and the heap headroom is small. When running as "main/othervm" there are additional jtreg threads running in the VM. These threads can apparently (sometimes?) allocate enough memory to disturb the test itself, pushing it over the edge with OOME as a result. To avoid having these threads running in the same VM as the test itself I've adjusted the test to spawn a new test VM through ProcessTools. > > Webrev: http://cr.openjdk.java.net/~pliden/8240714/webrev.0 > Bug: https://bugs.openjdk.java.net/browse/JDK-8240714 > > Testing: Manual > > cheers, > Per > From per.liden at oracle.com Fri Mar 13 14:03:00 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 13 Mar 2020 15:03:00 +0100 Subject: RFR: 8240714: ZGC: TestSmallHeap.java failed due to OutOfMemoryError In-Reply-To: <049E2DF9-F1FA-46A4-AA03-5C0DBE20CFBB@oracle.com> References: <74b2b7a0-fa16-7476-809a-fc550b4827d0@oracle.com> <049E2DF9-F1FA-46A4-AA03-5C0DBE20CFBB@oracle.com> Message-ID: Thanks Erik! /Per On 3/13/20 2:07 PM, Erik ?sterlund wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > >> On 10 Mar 2020, at 18:20, Per Liden wrote: >> >> ?The gc/z/TestSmallHeap.java test failed once due to OutOfMemoryError. When using a 8M heap, this test is fairly sensitive in the sense that the heap will be very crowded and the heap headroom is small. When running as "main/othervm" there are additional jtreg threads running in the VM. These threads can apparently (sometimes?) allocate enough memory to disturb the test itself, pushing it over the edge with OOME as a result. To avoid having these threads running in the same VM as the test itself I've adjusted the test to spawn a new test VM through ProcessTools. >> >> Webrev: http://cr.openjdk.java.net/~pliden/8240714/webrev.0 >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240714 >> >> Testing: Manual >> >> cheers, >> Per >> > From adityam at microsoft.com Fri Mar 13 19:41:28 2020 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Fri, 13 Mar 2020 19:41:28 +0000 Subject: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads In-Reply-To: <65966e2e-51c1-f8af-bc8c-650ea7cd27bf@redhat.com> References: <75346f8c-b492-76ea-d32a-0785f57c052a@redhat.com> <9b6dc025-1420-6af9-d903-76838ac669eb@redhat.com> <6f135db2-c47a-25b8-0b49-4200a1e8a645@redhat.com> <65966e2e-51c1-f8af-bc8c-650ea7cd27bf@redhat.com> Message-ID: Thank you both for the reviews, and thanks for sponsoring, Aleksey! -Aditya -----Original Message----- From: Aleksey Shipilev Sent: Friday, March 13, 2020 5:32 AM To: Thomas Schatzl ; Aditya Mandaleeka ; Severin Gehwolf ; hotspot-gc-dev at openjdk.java.net; shenandoah-dev Subject: Re: RFR: 8231668: Remove ForceDynamicNumberOfGCThreads On 3/13/20 10:44 AM, Aleksey Shipilev wrote: > On 3/13/20 10:38 AM, Thomas Schatzl wrote: >> looks good. Please fix the copyright dates for the tests as well >> since you did so for other files too already. No need for a re-review. > > +1, I would update those before pushing. Aditya, you don't need to do anything else. > >> Aleksey, will you sponsor the patch, or should I do it? > > Sure, I will sponsor it. Copyright lines updated; jdk-submit is clean, hotspot_gc_shenandoah is clean. Pushed: https://hg.openjdk.java.net/jdk/jdk/rev/367b1f73904c jcheck initially rejected the push, because: remote: [jcheck b8f3bac16fc8 2019-03-12 09:27:17 -0700] remote: remote: > Changeset: 58392:8ac108cd32af remote: > Author: shade remote: > Date: 2020-03-13 13:22 remote: > remote: > 8231668: Remove ForceDynamicNumberOfGCThreads remote: > Reviewed-by: shade, tschatzl remote: > Contributed-by: Aditya Mandaleeka remote: remote: Invalid contributor attribution The email should have been proper one, with @. -- Thanks, -Aleksey From linzang at tencent.com Mon Mar 16 09:18:18 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Mon, 16 Mar 2020 09:18:18 +0000 Subject: RFR(L): 8215624: add parallel heap inspection support for jmap histo(G1) Message-ID: Just update a new path, my preliminary measure show about 3.5x speedup of jmap histo on a nearly full 4GB G1 heap (8-core platform with parallel thread number set to 4). webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/ bug: https://bugs.openjdk.java.net/browse/JDK-8215624 CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 BRs, Lin ?> On 2020/3/2, 9:56 PM, "linzang(??)" wrote: > > Dear all, > Let me try to ease the reviewing work by some explanation :P > The patch's target is to speed up jmap -histo for heap iteration, from my experience it is necessary for large heap investigation. E.g in bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does take quite a while. > And if my understanding is corrent, even the jmap -histo without "live" option does heap inspection with heap lock acquired. so it is very likely to block mutator thread in allocation-sensitive scenario. I would say the faster the heap inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is necessary. > I think the parallel heap inspection should be applied to all kind of heap. However, consider the heap layout are different for GCs, much time is required to understand all kinds of the heap layout to make the whole change. IMO, It is not wise to have a huge patch for the whole solution at once, and it is even harder to review it. So I plan to implement it incrementally, the first patch (this one) is going to confirm the implemention detail of how jmap accept the new option, passes it to attachListener of the jvm process and then how to make the parallel inspection closure be generic enough to make it easy to extend to different heap layout. And also how to implement the heap inspection in specific gc's heap. This patch use G1's heap as the begining. > This patch actually do several things: > 1. Add an option "parallelThreadNum=" to jmap -histo, the default behavior is to set N to 0, means let's JVM decide how many threads to use for heap inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: https://bugs.openjdk.java.net/browse/JDK-8239290) > 2. Make a change in how Jmap passing arguments, changes in http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html, originally it pass options as separate arguments to attachListener, this patch change to that all options be compose to a single string. So the arg_count_max in attachListener.hpp do not need to be changed, and hence avoid the compatibility issue, as disscussed at https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html > 3. Add an abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares the data structure (KlassInfoTable) need for every parallel worker thread, and then call do_object_iterate_parallel() which is heap specific implementation. I also added some machenism in KlassInfoTable to support parallel iteration, such as merge(). > 4. In specific heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel(). > 5. Add related test. > 6. it may be easy to extend this patch for other kinds of heap by creating subclass of ParHeapInspectTask and implement the do_object_iterate_parallel(). > > Hope these info could help on code review and initate the discussion :-) > Thanks! > > BRs, > Lin > >On 2020/2/19, 9:40 AM, "linzang(??)" wrote:. > > > > Re-post this RFR with correct enhancement number to make it trackable. > > please ignore the previous wrong post. sorry for troubles. > > > > webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/ > > Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > > CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > > -------------- > > Lin > > >Hi Lin, > > > > > >Could you, please, re-post your RFR with the right enhancement number in > > >the message subject? > > >It will be more trackable this way. > > > > > >Thanks, > > >Serguei > > > > > > > > >On 2/17/20 10:29 PM, linzang(??) wrote: > > >> Dear David, > > >> Thanks a lot! > > >> I have updated the refined code to http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. > > >> IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. > > >> Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap, then we can extend the solution to other kinds of heap. > > >> > > >> Thanks, > > >> -------------- > > >> Lin > > >>> Hi Lin, > > >>> > > >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC > > >>> worker threads, and whether it needs to be extended beyond G1. > > >>> > > >>> I happened to spot one nit when browsing: > > >>> > > >>> src/hotspot/share/gc/shared/collectedHeap.hpp > > >>> > > >>> + virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, > > >>> + BoolObjectClosure* filter, > > >>> + size_t* missed_count, > > >>> + size_t thread_num) { > > >>> + return NULL; > > >>> > > >>> s/NULL/false/ > > >>> > > >>> Cheers, > > >>> David > > >>> > > >>> On 18/02/2020 2:15 pm, linzang(??) wrote: > > >>>> Dear All, > > >>>> May I ask your help to review the follow changes: > > >>>> webrev: > > >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ > > >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > > >>>> related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > > >>>> This patch enable parallel heap inspection of G1 for jmap histo. > > >>>> my simple test shown it can speed up 2x of jmap -histo with > > >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. > > >>>> > > >>>> ------------------------------------------------------------------------ > > >>>> BRs, > > >>>> Lin > > >> > > > > From magnus.ihse.bursie at oracle.com Mon Mar 16 13:37:42 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 16 Mar 2020 14:37:42 +0100 Subject: RFR: JDK-8240224 Allow building hotspot without the serial gc In-Reply-To: <96531866-D45A-4797-841D-3E6E26F403D5@oracle.com> References: <663301c4-aa45-4539-c9b7-d6fe68c531de@ihse.net> <46E744D7-AB29-43BA-808A-2A79248EAAAD@oracle.com> <96531866-D45A-4797-841D-3E6E26F403D5@oracle.com> Message-ID: <3ba36708-8176-670a-2a21-658bfe1342c1@oracle.com> Ok, let's take a step back in this, and split up this into separate issues. Issue 1: Compilation of hotspot fails if serialgc is disabled Issue 2: How can the build system cope with building when the product does not have serial gc? If we can just agree how to fix 1 I can make sure the issues from 2 does not affect hotspot. So let's see what this means: * I skip the changes in gcConfig.cpp * I skip the changes in genCollectedHeap.cpp. This will be fixed by JDK-8234502. * java.cpp is still missing an include. (I hope this part of the fix is non-problematic...) * test_collectorPolicy.cpp needs some kind of treatment. Either some code change to make it compile (like the one I suggested, or making the entire test dependent on SERIALGC). Or perhaps the entire test needs to be evaluated, since you describe it as "confused" and "questionable". In any case, I can create a separate issue to handle test_collectorPolicy.cpp; either by ifdefs/defines, or by re-evaluating the entire test. Does it sound like a good approach to you that, retarget the current issue (JDK-8240224) to be just about the build issues, and set it as depending on JDK-8234502 and a new test_collectorPolicy.cpp issue? That would leave only the missing include, which is trivial and shouldn't be any cause for concern. /Magnus On 2020-03-11 02:36, Kim Barrett wrote: >> On Mar 10, 2020, at 10:53 AM, Magnus Ihse Bursie wrote: >> >> On 2020-03-09 21:37, Kim Barrett wrote: >>>> On Mar 9, 2020, at 4:30 AM, Magnus Ihse Bursie >>>> wrote: >>>> >>>> When reworking the JVM feature handling, I wanted to try to compile Hotspot with various features enabled/disabled. I quickly found out that it's not really possible to build hotspot without the serial gc. While this is not a terribly important use case, I think it's good to be able to select serial freely, just as with the other collectors. >>>> >>>> With this patch it is possible to build a truly minimal JVM using 'configure --with-jvm-variants=custom --with-jvm-features=g1gc'. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8240224 >>>> >>>> WebRev: >>>> http://cr.openjdk.java.net/~ihse/JDK-8240224-building-without-serial-gc/webrev.01 >>>> >>>> >>>> /Magnus >>>> >>> I'm inclined to agree with David and Aleksey that this isn't really a >>> worthwhile exercise. Especially not if it involves making some >>> otherwise questionable or controversial changes. >>> >> As I've said in the previous comments, it's not so much about making Hotspot running without Serial GC as making configure live up to it's promise not to create an un-buildable configuration. > The ability to configure which GCs are present was added for several > reasons. > > Some packagers don't want to support some of the collectors that are > available in the source tree, so want to completely exclude the (to > them) unsupported collectors from their builds. > > Some packagers want to be able to reduce the VM footprint for certain > application areas; the "minimal" variant is an example. > > In preparation for removal of CMS it was useful to first be able to > build with it configured out. And CMS could have ended up in the > category of collectors that are excluded as unsupported by some > packagers. > > The implementation of this configurability tried to be reasonably > complete. Doing so helped shake out problems and show the intent. > But I don't know if it was ever demonstrated to work for all > possibilities, and even if it did at one time, bit rot is pretty much > inevitable since we don't test most of those possibilities. > > I don't think we should be spending effort on configurations for which > there is no evidence anyone actually wants or needs them. But having > the mechanism in the build system to try a configuration provides a > starting point if someone finds a need for something oddball, even if > it doesn't work out of the box. It would be better if broken > configurations failed nicely, but even that can't be ensured for long > without ongoing testing that I don't think anyone wants to do. > >> I apologize if my changes are questionable or controversial -- my assessment was on the contrary that they were simple and non-obtrusive, to the point of triviality. > Some of the discussion in this thread has been pointing out places > where a reviewer thinks that assessment is mistaken. > >>> src/hotspot/share/gc/shared/gcConfig.cpp >>> >>> I would instead suggest there should not be a default at all instead >>> of adding these cases, and the user must explicitly select the GC to >>> be used. Since we're talking about an atypical custom build anyway, >>> the user presumably knows what they are doing. And yeah, that makes >>> the buildjdk stuff elsewhere in this patch harder. >>> >> If you build without the Serial GC, it is not even possible to start the JVM without a flag selecting GC. Instead you get a somewhat cryptic (and incorrect) message about missing garbage collectors. Even if the end user would be able to know that you need to pass an additional option just to be able to start java, the build system knows no such thing, so we cannot even finish the build -- as soon as we try to use the newly built JVM (e.g. for running jlink), we will crash and burn. > Right, because the build system isn't dealing with the need to > explicitly specify the GC to use in such a configuration. That's what > I meant about making the build stuff harder. The build system would > need to look at the configuration to decide how to accomplish the build. > >>> src/hotspot/share/gc/shared/genCollectedHeap.cpp >>> 197 #if INCLUDE_SERIALGC >>> 198 MarkSweep::initialize(); >>> 199 #endif >>> >>> This whole file, and several associated files, are *only* used by >>> SerialGC now that CMS has been removed: JDK-8234502. >>> >> Then maybe they should be excluded when serial is not included? > That would be part of the work involved in resolving JDK-8234502. > >> Or, if it is determined that Serial GC is essential to hotspot, we should remove the INCLUDE_SERIALGC define and associated framework, since it's just a fake abstraction if it is not actually possible to build without serial GC. > I don?t think there is any belief that SerialGC must always be included. That it can?t > currently be excluded is an artifact of nobody having the need and the resources to > make that possible. > >>> make/hotspot/lib/JvmFeatures.gmk >>> 58 ifeq ($(JVM_VARIANT), custom) >>> 59 JVM_CFLAGS_FEATURES += -DVMTYPE=\"Custom\" >>> 60 endif >>> >>> This change looks unrelated to whether serialgc is present or absent. >>> If so, it doesn't belong in this changeset at all. >>> >> You are correct that this is not strictly about serialgc. When I tested my custom build with only epsilongc, I discovered that jtreg barfed on the version string produced by the custom JVM build. This is a fix that makes sure the VMTYPE always has a value. If you object to me pushing it as part of this fix, I can remove it from here and submit it as a separate issue. (I just didn't think it was worth the hassle.) > I understand there is overhead to breaking things into multiple > changes, but combining unrelated changes can make archeology and > problem or rationale attribution much harder. I looked at this and > had no idea what it was for, and it wasn't called out in the RFR or > anywhere else. > >>> make/hotspot/lib/JvmFeatures.gmk >>> [removed] >>> 154 # If serial is disabled, we cannot use serial as OldGC in parallel >>> 155 JVM_EXCLUDE_FILES += psMarkSweep.cpp psMarkSweepDecorator.cpp >>> >>> This was missed by JDK-8235860, which removed those files. Good find. >>> >> ... but according to your comment above, that fix also missed to add a bunch of other files that should be excluded..? (If we should keep the ability to disable serial gc, that is?) > The comment above was about a different change, the removal of CMS, > which is known to be incomplete and have a number of further cleanups > and refactorings to do before all vestiges have been removed. > > This one is about the removal of the Serial-Old variant of ParallelGC, > which was thought to be complete, but missed this little snippet. > >>> test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp >>> >>> As originally written, this test was *only* testing SerialGC. It's not >>> obvious that it is actually GC-agnostic and can use the default GC if >>> that isn't SerialGC. Certainly some of the naming suggests otherwise. >>> Was this tested with all the other configurations? >>> >> No, I have not tested all other configurations. I verified that I could build with only g1, only zgc and only epsilongc. I also tested to run tier1 testing, and it "mostly" succeeded, but it still failed on several tests. My quick eyeballing of the situation indicated that the absolute majority (and perhaps all) these failures were related to jtreg tests not properly declaring their dependencies on compiler1 or compiler2. (Remember, on this bare-bones JVM I only had the interpreter, and neither c1 nor c2). >> >> I *could* of course run a suitable set of testing with say c1 and c2 enabled, and just a single gc enabled, for the set of all gcs != serial gc, but then we're *really* getting into the "not worth it" land. >> >> It is not clear to me that the test is only run with Serial GC. As far as I can interpret the test framework, this is run with the default collector, which typically is *not* serialgc on our testing framework. If this is only valid for Serial GC, perhaps the test needs to be amended? > Looking at this some more, I don't know what this test thinks it's > doing, but I suspect it's confused. It's using TEST_VM and > TEST_OTHER_VM, both of which create the VM before running the test > body. The kinds of things it's doing in that context seem pretty > questionable. > From zgu at redhat.com Mon Mar 16 16:26:49 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 16 Mar 2020 12:26:49 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> Message-ID: <1908c392-c888-b6c2-0a3c-f3860e7da2cf@redhat.com> There are two issues in earlier patch. 1) It keeps unloading nemthod's metadata alive 2) There are chances that GC is cancelled when it arrives final mark pause, that results bypassing evacuation/concurrent roots and triggers assertion in ShenandoahNMethod::heal_nmethod() method. Now, I moved all nmethod metadata marking and evacuating inside ShenandoahNMethod::heal_nmethod(), which is cleaner. Also, incorporated Aleksey's suggestion for handling nmethod disarming during degenerated GC, to follow regular phase transition. Updated: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.03/ Test: tier1 (fastdebug and release) with ShenandoahGC Thanks, -Zhengyu On 3/13/20 8:56 AM, Zhengyu Gu wrote: > Overnight tests showed problems with this patch. So, I would like to > withdraw this RFR. > > Thanks, > > -Zhengyu > > On 3/12/20 12:43 PM, Roman Kennke wrote: >>>> Hi Zhengyu, >>>> >>>> in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp: >>>> +????? } else if >>>> (ShenandoahConcurrentRoots::can_do_concurrent_class_unloading()) { >>>> +??????? // Disarm nmethods that armed for concurrent mark. >>>> +??????? // On normal code path (non-empty Cset), it depends on >>>> update_roots() to >>>> +??????? // disarm nmethods in degenerated GC. >>>> +??????? ShenandoahCodeRoots::disarm_nmethods(); >>>> >>>> beware that the update_roots() is only called at the end of update_refs >>>> phase. The same call at end of marking is orphaned since removal of >>>> piggy-backed marking. >>> >>> I think it is fine, successful degenerated GC cycle should always >>> execute update_refs, no? >> >> Ok. I was only worried because the comment seems to imply it relies to >> update_roots() at the end of mark. Aleksey's patch is removing that. If >> update_roots() at the end of update_refs is good too, then fine. >> >> Thanks, >> Roman >> >>> Thanks, >>> >>> -Zhengyu >>> >>>> >>>> Otherwise looks good. >>>> >>>> Thanks, >>>> Roman >>>> >>>> >>>> On 3/11/20 10:43 PM, Zhengyu Gu wrote: >>>>> Revised based on offline discussions. >>>>> >>>>> Piggyback on stack code root rescanning to SATB draining task. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.02/ >>>>> >>>>> Reran tests: >>>>> ??? hotspot_gc_shenandoah >>>>> ??? tools/javac >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu >>>>> >>>>> On 3/4/20 6:06 PM, Zhengyu Gu wrote: >>>>>> Traversal GC has the same issue, also need to remark on stack code >>>>>> roots in final traversal. >>>>>> >>>>>> @@ -263,11 +263,12 @@ >>>>>> ??????? if (!_heap->is_degenerated_gc_in_progress()) { >>>>>> ????????? ShenandoahTraversalRootsClosure roots_cl(q, rp); >>>>>> ????????? ShenandoahTraversalSATBThreadsClosure tc(&satb_cl); >>>>>> ????????? if (unload_classes) { >>>>>> ??????????? ShenandoahRemarkCLDClosure remark_cld_cl(&roots_cl); >>>>>> -??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>>>> NULL, &tc); >>>>>> +??????? MarkingCodeBlobClosure code_cl(&roots_cl, >>>>>> CodeBlobToOopClosure::FixRelocations); >>>>>> +??????? _rp->strong_roots_do(worker_id, &roots_cl, &remark_cld_cl, >>>>>> &code_cl, &tc); >>>>>> ????????? } else { >>>>>> ??????????? CLDToOopClosure cld_cl(&roots_cl, >>>>>> ClassLoaderData::_claim_strong); >>>>>> ??????????? _rp->roots_do(worker_id, &roots_cl, &cld_cl, NULL, &tc); >>>>>> ????????? } >>>>>> ??????? } else { >>>>>> >>>>>> >>>>>> Updated webrev: >>>>>> http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.01/ >>>>>> >>>>>> Thank, >>>>>> >>>>>> -Zhengyu >>>>>> >>>>>> On 2/25/20 12:13 PM, Zhengyu Gu wrote: >>>>>>> Shenandoah encounters a few test failures with tools/javac. Verifier >>>>>>> catches unmarked oops in nmethod's metadata during root >>>>>>> evacuation in >>>>>>> final mark phase. >>>>>>> >>>>>>> The problem is that, Shenandoah marks on stack nmethods in init mark >>>>>>> pause, but it does not mark nmethod's metadata during concurrent >>>>>>> mark >>>>>>> phase, when new nmethod is about to be executed. >>>>>>> >>>>>>> The solution: >>>>>>> 1) Use nmethod_entry_barrier to keep nmethod's metadata alive when >>>>>>> the nmethod is about to be executed, when nmethod entry barrier is >>>>>>> supported. >>>>>>> >>>>>>> 2) Remark on stack nmethod's metadata at final mark pause. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239926 >>>>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/ >>>>>>> >>>>>>> Test: >>>>>>> ???? hotspot_gc_shenandoah (fastdebug and release) >>>>>>> ???? tools/javac with ShenandoahCodeRootsStyle = 1 and 2 >>>>>>> (fastdebug and >>>>>>> release) >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -Zhengyu >>>>> >>>> >>> >> From shade at redhat.com Mon Mar 16 17:00:29 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 16 Mar 2020 18:00:29 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <1908c392-c888-b6c2-0a3c-f3860e7da2cf@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> <1908c392-c888-b6c2-0a3c-f3860e7da2cf@redhat.com> Message-ID: <97f71b2d-2237-be84-ea46-fb48d31805ae@redhat.com> On 3/16/20 5:26 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.03/ Very good. Mostly stylistic, and tidy-ups comments: *) Can we split the has_forwarded_object path in shenandoahConcurrentMark.cpp here? It would avoid instantiating both closures at expense of some code duplication: 247 ShenandoahMarkResolveRefsClosure resolve_mark_cl(q, rp); 248 ShenandoahMarkRefsClosure mark_cl(q, rp); 249 OopClosure* oops = ShenandoahHeap::heap()->has_forwarded_objects() ? 250 static_cast(&resolve_mark_cl) : 251 static_cast(&mark_cl); 252 MarkingCodeBlobClosure blobsCl(oops, !CodeBlobToOopClosure::FixRelocations); 253 ShenandoahSATBAndRemarkCodeRootsThreadsClosure tc(&cl, &blobsCl); 254 Threads::threads_do(&tc); *) Please reformat the comment here in shenandoahHeap.cpp: 1424 // Arm nmethods for concurrent marking. 1425 // When a nmethod is about to be executed, we need to make sure that all its 1426 // metadata are marked. 1427 // The alternative is to remark thread roots at final mark pause, but it can 1428 // be potential latency killer. to: // Arm nmethods for concurrent marking. When a nmethod is about to be executed, // we need to make sure that all its metadata are marked. alternative is to remark // thread roots at final mark pause, but it can be potential latency killer. *) In shenandoahNMethod.cpp, I do wonder if you want to specialize ShenandoahKeepNMethodMetadataAliveClosure with template HAS_FWD. And then dispatch to proper closure in ShenandoahNMethod::heal_nmethod. Keeps one branch out on the hot path for nmethod with many oops? *) In shenandoahRootProcessor.cpp, CLD roots are code roots now? Does it make sense? -- Thanks, -Aleksey From zgu at redhat.com Mon Mar 16 18:32:43 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 16 Mar 2020 14:32:43 -0400 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: <97f71b2d-2237-be84-ea46-fb48d31805ae@redhat.com> References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> <1908c392-c888-b6c2-0a3c-f3860e7da2cf@redhat.com> <97f71b2d-2237-be84-ea46-fb48d31805ae@redhat.com> Message-ID: Hi Aleksey, Please see comments inline. On 3/16/20 1:00 PM, Aleksey Shipilev wrote: > On 3/16/20 5:26 PM, Zhengyu Gu wrote: >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.03/ > > Very good. Mostly stylistic, and tidy-ups comments: > > *) Can we split the has_forwarded_object path in shenandoahConcurrentMark.cpp here? It would avoid > instantiating both closures at expense of some code duplication: > > 247 ShenandoahMarkResolveRefsClosure resolve_mark_cl(q, rp); > 248 ShenandoahMarkRefsClosure mark_cl(q, rp); > 249 OopClosure* oops = ShenandoahHeap::heap()->has_forwarded_objects() ? > 250 static_cast(&resolve_mark_cl) : > 251 static_cast(&mark_cl); > 252 MarkingCodeBlobClosure blobsCl(oops, !CodeBlobToOopClosure::FixRelocations); > 253 ShenandoahSATBAndRemarkCodeRootsThreadsClosure tc(&cl, &blobsCl); > 254 Threads::threads_do(&tc); > Sure. > *) Please reformat the comment here in shenandoahHeap.cpp: > > 1424 // Arm nmethods for concurrent marking. > 1425 // When a nmethod is about to be executed, we need to make sure that all its > 1426 // metadata are marked. > 1427 // The alternative is to remark thread roots at final mark pause, but it can > 1428 // be potential latency killer. > > to: > > // Arm nmethods for concurrent marking. When a nmethod is about to be executed, > // we need to make sure that all its metadata are marked. alternative is to remark > // thread roots at final mark pause, but it can be potential latency killer. > Fixed. > *) In shenandoahNMethod.cpp, I do wonder if you want to specialize > ShenandoahKeepNMethodMetadataAliveClosure with template HAS_FWD. And then dispatch to proper closure > in ShenandoahNMethod::heal_nmethod. Keeps one branch out on the hot path for nmethod with many oops? > Make sense. > *) In shenandoahRootProcessor.cpp, CLD roots are code roots now? Does it make sense? > Bad naming, changed: _include_concurrent_roots => _stw_roots_processing _include_concurrent_code_roots => _stw_class_unloading Updated: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.04/index.html Test: In progress. Okay? Thanks, -Zhengyu > From shade at redhat.com Mon Mar 16 18:36:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 16 Mar 2020 19:36:23 +0100 Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's metadata In-Reply-To: References: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com> <612617a7-f08a-2b69-908e-26e146fa0a8f@redhat.com> <20afe843-e4bc-96ee-0d54-0c40435183a8@redhat.com> <69ad7ddf-0808-1f0e-43e1-9e9a37b8fb04@redhat.com> <1908c392-c888-b6c2-0a3c-f3860e7da2cf@redhat.com> <97f71b2d-2237-be84-ea46-fb48d31805ae@redhat.com> Message-ID: On 3/16/20 7:32 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.04/index.html Looks fine to me. Thank you. -- -Aleksey From mark.reinhold at oracle.com Mon Mar 16 20:08:17 2020 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Mon, 16 Mar 2020 21:08:17 +0100 (CET) Subject: New candidate JEP: 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) Message-ID: <20200316200817.DB91F31A1FD@eggemoggin.niobe.net> https://openjdk.java.net/jeps/377 - Mark From tprintezis at twitter.com Mon Mar 16 21:02:15 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Mon, 16 Mar 2020 18:02:15 -0300 Subject: high StringTable scanning overhead during young GCs Message-ID: Hi all, We have seen the following issue a few times in our data centers: A service is interning strings at a steady rate. The interned strings live long enough to be promoted to the old generation. They are reclaimed during the following Full GC / concurrent cycle. However, until that happens, young GC times are monotonically increasing as young GCs have to scan the entire StringTable, which keeps growing in size. I can reproduce this with the latest 15, with both G1 and ParallelGC. FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent CMS cycles (we have a flag that starts a CMS cycle every N secs). This helps to keep young GC times mostly in check for our services that suffer from this issue. However, this overhead could be avoided if the StringTable was split into two parts, one for entries that could potentially point into the young gen, the other for entries that definitely do not point into the young gen. Each young GC will only have to scan the former (similar to what was done for nmethods). Has anyone looked into this? Thanks, Tony ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From kim.barrett at oracle.com Tue Mar 17 03:12:53 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 16 Mar 2020 23:12:53 -0400 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: <0FF82BEF-2500-4263-8BC4-01388C8C4000@oracle.com> > On Mar 16, 2020, at 5:02 PM, Tony Printezis wrote: > > Hi all, > > We have seen the following issue a few times in our data centers: A service > is interning strings at a steady rate. The interned strings live long > enough to be promoted to the old generation. They are reclaimed during the > following Full GC / concurrent cycle. However, until that happens, young GC > times are monotonically increasing as young GCs have to scan the entire > StringTable, which keeps growing in size. > > I can reproduce this with the latest 15, with both G1 and ParallelGC. > > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent > CMS cycles (we have a flag that starts a CMS cycle every N secs). This > helps to keep young GC times mostly in check for our services that suffer > from this issue. > > However, this overhead could be avoided if the StringTable was split into > two parts, one for entries that could potentially point into the young gen, > the other for entries that definitely do not point into the young gen. Each > young GC will only have to scan the former (similar to what was done for > nmethods). > > Has anyone looked into this? All young collections treat the StringTable as weak. (I think only Parallel used to, but that was changed a while ago.) Young collection processing of a weak entry is relatively cheap when the referent is old (determine that it's not in the collection set, so do nothing). (since JDK 12?) G1 parallelizes StringTable processing (in conjunction with several other kinds of off-heap weak references, such as JNI weak references). So one would think it would require a pretty large number of old StringTable entries to have a significant effect. ParallelGC currently processes the StringTable single threaded; there's an RFE to parallelize it (JDK-8210100). Can you provide some numbers / log data? If there is a performance problem there, maybe there's just a bug somewhere that needs to be fixed. I think splitting the StringTable that way isn't entirely trivial, and would need pretty strong justification. From stefan.johansson at oracle.com Tue Mar 17 09:31:12 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 17 Mar 2020 10:31:12 +0100 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: Hi Tony, > 16 mars 2020 kl. 22:02 skrev Tony Printezis : > > ... > > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent > CMS cycles (we have a flag that starts a CMS cycle every N secs). This > helps to keep young GC times mostly in check for our services that suffer > from this issue. > If you want to try a similar workaround for G1. In JDK 12, G1PeriodicGCInterval was added as part of JEP 346: Promptly Return Unused Committed Memory from G1. This flag allows you to trigger GCs based on that interval, and by default the GC triggered start a concurrent cycle. This feature was mostly designed for idle applications and the periodic request will be skipped if any GC occurred in the interval, so depending on your workload disabling adaptive IHOP and setting a lower fixed IHOP might be better approach. Thanks, Stefan > ... > ????? > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From rkennke at redhat.com Tue Mar 17 10:55:47 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 17 Mar 2020 11:55:47 +0100 Subject: RFR: 8241081: Shenandoah: Do not modify update-watermark concurrently Message-ID: <28ab3dea-d88e-eae2-c7da-3c38f8cb94d2@redhat.com> JDK-8240873 introduced short-cutting arraycopy-barriers on objects beyond the update-watermark. Concurrently updating the update-watermark after a region has been completely updated proves problematic: we see various different crashes related to this. Strengthening the fences around it doesn't fully solve the problem, apparently the problem is deeper. It doesn't seem worth to keep doing this for very little gain. Bug: https://bugs.openjdk.java.net/browse/JDK-8241081 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.02/ Testing: hotspot_gc_shenandoah and the failing TestStringDedupStress, both in a loop for some hours Thanks, Roman From shade at redhat.com Tue Mar 17 11:01:15 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 17 Mar 2020 12:01:15 +0100 Subject: RFR: 8241081: Shenandoah: Do not modify update-watermark concurrently In-Reply-To: <28ab3dea-d88e-eae2-c7da-3c38f8cb94d2@redhat.com> References: <28ab3dea-d88e-eae2-c7da-3c38f8cb94d2@redhat.com> Message-ID: <35f24a61-cd45-f5ce-411d-e831a7bb6d97@redhat.com> On 3/17/20 11:55 AM, Roman Kennke wrote: > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.02/ Looks good. One tiny thing: can you please read _update_watermark into local field, assert that value and return the same one? Instead of doing effectively three memory accesses, which might disagree. -- Thanks, -Aleksey From shade at redhat.com Tue Mar 17 11:25:06 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 17 Mar 2020 12:25:06 +0100 Subject: RFR (S) 8241093: Shenandoah: editorial changes in flag descriptions Message-ID: <36e0c34e-a34b-320e-da3e-e47245c39a3a@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241093 Webrev: https://cr.openjdk.java.net/~shade/8241093/webrev.01/ This is a good time to polish any phrasing. Testing: fastdebug build -- Thanks, -Aleksey From rkennke at redhat.com Tue Mar 17 11:33:18 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 17 Mar 2020 12:33:18 +0100 Subject: RFR: 8241081: Shenandoah: Do not modify update-watermark concurrently In-Reply-To: <35f24a61-cd45-f5ce-411d-e831a7bb6d97@redhat.com> References: <28ab3dea-d88e-eae2-c7da-3c38f8cb94d2@redhat.com> <35f24a61-cd45-f5ce-411d-e831a7bb6d97@redhat.com> Message-ID: <297b2a4d-59ce-6e87-808c-636d8cb06cc9@redhat.com> >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.02/ > > Looks good. > > One tiny thing: can you please read _update_watermark into local field, assert that value and return > the same one? Instead of doing effectively three memory accesses, which might disagree. Ok like this? http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.03/ From shade at redhat.com Tue Mar 17 11:37:36 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 17 Mar 2020 12:37:36 +0100 Subject: RFR: 8241081: Shenandoah: Do not modify update-watermark concurrently In-Reply-To: <297b2a4d-59ce-6e87-808c-636d8cb06cc9@redhat.com> References: <28ab3dea-d88e-eae2-c7da-3c38f8cb94d2@redhat.com> <35f24a61-cd45-f5ce-411d-e831a7bb6d97@redhat.com> <297b2a4d-59ce-6e87-808c-636d8cb06cc9@redhat.com> Message-ID: <3730dda8-8cb4-16b1-72a0-c3c2510ff006@redhat.com> On 3/17/20 12:33 PM, Roman Kennke wrote: >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.02/ >> >> Looks good. >> >> One tiny thing: can you please read _update_watermark into local field, assert that value and return >> the same one? Instead of doing effectively three memory accesses, which might disagree. > > Ok like this? > > http://cr.openjdk.java.net/~rkennke/JDK-8241081/webrev.03/ Yes. -- Thanks, -Aleksey From rkennke at redhat.com Tue Mar 17 11:49:51 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 17 Mar 2020 12:49:51 +0100 Subject: RFR (S) 8241093: Shenandoah: editorial changes in flag descriptions In-Reply-To: <36e0c34e-a34b-320e-da3e-e47245c39a3a@redhat.com> References: <36e0c34e-a34b-320e-da3e-e47245c39a3a@redhat.com> Message-ID: Looks good, thanks for doing it! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241093 > > Webrev: > https://cr.openjdk.java.net/~shade/8241093/webrev.01/ > > This is a good time to polish any phrasing. > > Testing: fastdebug build > From tprintezis at twitter.com Tue Mar 17 20:01:53 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Tue, 17 Mar 2020 13:01:53 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: <0FF82BEF-2500-4263-8BC4-01388C8C4000@oracle.com> References: <0FF82BEF-2500-4263-8BC4-01388C8C4000@oracle.com> Message-ID: Hi Kim, Unfortunately, I?m not sure I can share exact numbers / logs from our production servers (would a GC times chart without units help so you can see the trend?). But let me give you a quick overview. These numbers are from when a service used ParallelGC. We have detailed timers for certain GC phases, so I know exactly how long each phase took (and let me know if you?d like to know more about said timers :-) ): * Immediately after a JVM started, StringTable scanning would take around 2ms per young GC. * 3 days later, StringTable scanning would take around 45x longer than the object copying time (this translate to around 97.5% of the total young GC time). So, let?s say for the sake of argument that object copying took 10ms (this number might or might not be close to the actual one), then StringTable scanning took around 450ms. * Immediately after a Full GC, StringTable scanning would drop back down to around 2ms. All this tells me that: * I don?t think there?s a bug. I think the service just piles up a lot of entries in the StringTable (unfortunately, I don?t know the actual number; is there a flag that prints it in 8?). * Parallelizing StringTable scanning would just postpone the problem. Let?s say we use 4 threads. In the above example, StringTable scanning would still take more than 10x longer than the object copying time after 3 days. And, yes, this might be surprising but we do have services that do no Full GCs / concurrent cycles over several days. Most of our services are stateless that put little pressure on the old gen. And, yes, I know that this is generally uncommon. Tony ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com On March 16, 2020 at 11:12:59 PM, Kim Barrett (kim.barrett at oracle.com) wrote: > On Mar 16, 2020, at 5:02 PM, Tony Printezis wrote: > > Hi all, > > We have seen the following issue a few times in our data centers: A service > is interning strings at a steady rate. The interned strings live long > enough to be promoted to the old generation. They are reclaimed during the > following Full GC / concurrent cycle. However, until that happens, young GC > times are monotonically increasing as young GCs have to scan the entire > StringTable, which keeps growing in size. > > I can reproduce this with the latest 15, with both G1 and ParallelGC. > > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent > CMS cycles (we have a flag that starts a CMS cycle every N secs). This > helps to keep young GC times mostly in check for our services that suffer > from this issue. > > However, this overhead could be avoided if the StringTable was split into > two parts, one for entries that could potentially point into the young gen, > the other for entries that definitely do not point into the young gen. Each > young GC will only have to scan the former (similar to what was done for > nmethods). > > Has anyone looked into this? All young collections treat the StringTable as weak. (I think only Parallel used to, but that was changed a while ago.) Young collection processing of a weak entry is relatively cheap when the referent is old (determine that it's not in the collection set, so do nothing). (since JDK 12?) G1 parallelizes StringTable processing (in conjunction with several other kinds of off-heap weak references, such as JNI weak references). So one would think it would require a pretty large number of old StringTable entries to have a significant effect. ParallelGC currently processes the StringTable single threaded; there's an RFE to parallelize it (JDK-8210100). Can you provide some numbers / log data? If there is a performance problem there, maybe there's just a bug somewhere that needs to be fixed. I think splitting the StringTable that way isn't entirely trivial, and would need pretty strong justification. From tprintezis at twitter.com Tue Mar 17 20:23:03 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Tue, 17 Mar 2020 13:23:03 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: Hi Stefan, Thanks for mentioning the flag. I?ll take a look. I don?t think we?ll be on 12 or later any time soon, but I can always backport the change if needed. Quick question: ?the periodic request will be skipped if any GC occurred in the interval?: So, if a young GC happens during the interval, a periodic concurrent cycle won?t? Tony ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com On March 17, 2020 at 5:31:24 AM, Stefan Johansson ( stefan.johansson at oracle.com) wrote: Hi Tony, > 16 mars 2020 kl. 22:02 skrev Tony Printezis : > > ... > > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent > CMS cycles (we have a flag that starts a CMS cycle every N secs). This > helps to keep young GC times mostly in check for our services that suffer > from this issue. > If you want to try a similar workaround for G1. In JDK 12, G1PeriodicGCInterval was added as part of JEP 346: Promptly Return Unused Committed Memory from G1. This flag allows you to trigger GCs based on that interval, and by default the GC triggered start a concurrent cycle. This feature was mostly designed for idle applications and the periodic request will be skipped if any GC occurred in the interval, so depending on your workload disabling adaptive IHOP and setting a lower fixed IHOP might be better approach. Thanks, Stefan > ... > ????? > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From stefan.johansson at oracle.com Wed Mar 18 07:55:03 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 18 Mar 2020 08:55:03 +0100 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: Hi Tony, Yes, if a young GC happens during the interval it will prevent a periodic concurrent start GC from happening. I see how this might not play well with your use case, unless your service sometimes have an idle period. Stefan > 17 mars 2020 kl. 21:23 skrev Tony Printezis : > > Hi Stefan, > > Thanks for mentioning the flag. I?ll take a look. I don?t think we?ll be on 12 or later any time soon, but I can always backport the change if needed. Quick question: ?the periodic request will be skipped if any GC occurred in the interval?: So, if a young GC happens during the interval, a periodic concurrent cycle won?t? > > Tony > > > ????? > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com > > > On March 17, 2020 at 5:31:24 AM, Stefan Johansson (stefan.johansson at oracle.com) wrote: > >> Hi Tony, >> >> > 16 mars 2020 kl. 22:02 skrev Tony Printezis : >> > >> > ... >> > >> > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent >> > CMS cycles (we have a flag that starts a CMS cycle every N secs). This >> > helps to keep young GC times mostly in check for our services that suffer >> > from this issue. >> > >> If you want to try a similar workaround for G1. In JDK 12, G1PeriodicGCInterval was added as part of JEP 346: Promptly Return Unused Committed Memory from G1. This flag allows you to trigger GCs based on that interval, and by default the GC triggered start a concurrent cycle. This feature was mostly designed for idle applications and the periodic request will be skipped if any GC occurred in the interval, so depending on your workload disabling adaptive IHOP and setting a lower fixed IHOP might be better approach. >> >> Thanks, >> Stefan >> >> > ... >> > ????? >> > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From thomas.schatzl at oracle.com Wed Mar 18 11:05:39 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 18 Mar 2020 12:05:39 +0100 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: References: Message-ID: <5135da92-2302-a793-15d7-9db4070104c3@oracle.com> Hi Kim, first, sorry for the late reply. On 10.03.20 22:27, Kim Barrett wrote: >> On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: >> >> Please review this change to the handling of completed buffers by mutator >> threads. Previously it would conditionally process and potentially reuse the >> buffer, rather than enqueuing it. Now, always enqueue the buffer and >> allocate a new one, and conditionally process the next (oldest) dirty buffer >> in the DCQS. The benefit of this is that the buffers being processed by the >> mutator age for a while in the DCQS (just as is done by for concurrent >> refinement thread processing), so if the mutator is making repeated writes >> to the same or nearby locations, the associated card marking has more >> opportunaty to be filtered out. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8139652 >> >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ >> >> Testing >> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. > > The original webrev was based on JDK-8239825 and JDK-8240133. The > push and backout of JDK-8240133 has made that webrev no longer apply > cleanly. So here's a new, up to date (as of this morning) webrev: > > https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ > > Tested with mach5 tier1-5 along with change for JDK-8239825 (which > hasn't been pushed yet). - g1DirtyCardQueue.cpp:544: indentation of "fully_processed" parameter - I suggest to undo that line break in the assert 547 - the resulting string is like 83 chars. Looks good otherwise. I do not need a re-review for above tiny changes. Thanks, Thomas From thomas.schatzl at oracle.com Wed Mar 18 11:18:17 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 18 Mar 2020 12:18:17 +0100 Subject: RFR (S): 8240590: Add MemRegion::destroy_array to complement introduced create_array Message-ID: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> Hi all, can I have reviews for this small change that introduces a MemRegion::destroy_array() method to complement the recently introduced MemRegion::create_array(). CR: https://bugs.openjdk.java.net/browse/JDK-8240590 Webrev: http://cr.openjdk.java.net/~tschatzl/8240590/webrev/ Testing: hs-tier1-5 Thanks, Thomas From stefan.johansson at oracle.com Wed Mar 18 13:59:34 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 18 Mar 2020 14:59:34 +0100 Subject: RFR (S): 8240590: Add MemRegion::destroy_array to complement introduced create_array In-Reply-To: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> References: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> Message-ID: <14723860-4954-4CF5-A370-90839F4181F7@oracle.com> Hi Thomas, > 18 mars 2020 kl. 12:18 skrev Thomas Schatzl : > > Hi all, > > can I have reviews for this small change that introduces a MemRegion::destroy_array() method to complement the recently introduced MemRegion::create_array(). > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240590 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8240590/webrev/ Looks good, Stefan > Testing: > hs-tier1-5 > > Thanks, > Thomas From leo.korinth at oracle.com Wed Mar 18 14:54:47 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Wed, 18 Mar 2020 15:54:47 +0100 Subject: RFR (S): 8240590: Add MemRegion::destroy_array to complement introduced create_array In-Reply-To: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> References: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> Message-ID: On 18/03/2020 12:18, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that introduces a > MemRegion::destroy_array() method to complement the recently introduced > MemRegion::create_array(). > > CR: > https://bugs.openjdk.java.net/browse/JDK-8240590 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8240590/webrev/ Looks good to me, thanks. /Leo > Testing: > hs-tier1-5 > > Thanks, > ? Thomas From zgu at redhat.com Wed Mar 18 15:19:52 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 18 Mar 2020 11:19:52 -0400 Subject: [15] RFR 8241155: Shenandoah: Traversal GC should mark strong CLD roots when class unloading is enabled Message-ID: <22c82093-321b-e888-d845-fbedfea99323@redhat.com> Current traversal GC does not mark strong CLD roots, it seems wrong. Bug: https://bugs.openjdk.java.net/browse/JDK-8241155 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241155/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) tools/javac Thanks, -Zhengyu From rkennke at redhat.com Wed Mar 18 15:43:51 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 18 Mar 2020 16:43:51 +0100 Subject: [15] RFR 8241155: Shenandoah: Traversal GC should mark strong CLD roots when class unloading is enabled In-Reply-To: <22c82093-321b-e888-d845-fbedfea99323@redhat.com> References: <22c82093-321b-e888-d845-fbedfea99323@redhat.com> Message-ID: <3ee832ec-e184-7214-d4cf-a38fd7bd9b20@redhat.com> Are you sure that it needs to scan (all) code-roots, and thus keep everything alive that is referenced from them? Roman > Current traversal GC does not mark strong CLD roots, it seems wrong. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241155 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241155/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? tools/javac > > Thanks, > > -Zhengyu > From zgu at redhat.com Wed Mar 18 15:46:40 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 18 Mar 2020 11:46:40 -0400 Subject: [15] RFR 8241155: Shenandoah: Traversal GC should mark strong CLD roots when class unloading is enabled In-Reply-To: <3ee832ec-e184-7214-d4cf-a38fd7bd9b20@redhat.com> References: <22c82093-321b-e888-d845-fbedfea99323@redhat.com> <3ee832ec-e184-7214-d4cf-a38fd7bd9b20@redhat.com> Message-ID: On 3/18/20 11:43 AM, Roman Kennke wrote: > Are you sure that it needs to scan (all) code-roots, and thus keep > everything alive that is referenced from them? No, I am not sure. It scans CSset code roots since beginning (I believe). If it is an issue, should be addressed in a separate CR. -Zhengyu > > Roman > >> Current traversal GC does not mark strong CLD roots, it seems wrong. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8241155 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241155/webrev.00/ >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) >> ? tools/javac >> >> Thanks, >> >> -Zhengyu >> > From leonid.mesnik at oracle.com Wed Mar 18 19:37:17 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 18 Mar 2020 12:37:17 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible Message-ID: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> Hi Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8241123 Leonid From igor.ignatyev at oracle.com Wed Mar 18 19:48:50 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 18 Mar 2020 12:48:50 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> Message-ID: Hi Leonid, I've started looking at your webrev, and so far have a couple questions: > Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) can't you use just a volatile boolean field? > Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. -- Igor > On Mar 18, 2020, at 12:37 PM, Leonid Mesnik wrote: > > Hi > > Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. > > The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. > > > Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. > > ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. > > Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) > > Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. > > webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ > > bug: https://bugs.openjdk.java.net/browse/JDK-8241123 > > > Leonid > From leonid.mesnik at oracle.com Wed Mar 18 20:29:22 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 18 Mar 2020 13:29:22 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> Message-ID: <5344eb3a-b17a-09c1-0f1f-8c1462899fe3@oracle.com> On 3/18/20 12:48 PM, Igor Ignatyev wrote: > Hi Leonid, > > I've started looking at your webrev, and so far have a couple questions: > >> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) > can't you use just a volatile boolean field? I can, but I don't see any benefits to use volatile fields instead of atomics. I prefer to use Atomic* anywhere because of it's clearer semantics. Using of explicit get/set and other similar accessors. > >> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. > won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? Unfortunately no. The CountDownLatch would be a nice solution but it is possible to get OOME in gc/lock (might be other) tests. I replaced Wicked by the same reason. Updating the AtomicInteger doesn't allocate any memory and don't cause OOME. Leonid > > I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. > > -- Igor > >> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik wrote: >> >> Hi >> >> Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. >> >> The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. >> >> >> Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. >> >> ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. >> >> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >> >> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >> >> >> Leonid >> From igor.ignatyev at oracle.com Wed Mar 18 21:15:14 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 18 Mar 2020 14:15:14 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: <5344eb3a-b17a-09c1-0f1f-8c1462899fe3@oracle.com> References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> <5344eb3a-b17a-09c1-0f1f-8c1462899fe3@oracle.com> Message-ID: <50110939-1AEF-40B5-969C-C5313633B1F9@oracle.com> > On Mar 18, 2020, at 1:29 PM, Leonid Mesnik wrote: > > > On 3/18/20 12:48 PM, Igor Ignatyev wrote: >> Hi Leonid, >> >> I've started looking at your webrev, and so far have a couple questions: >> >>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >> can't you use just a volatile boolean field? > I can, but I don't see any benefits to use volatile fields instead of atomics. I prefer to use Atomic* anywhere because of it's clearer semantics. Using of explicit get/set and other similar accessors. you aren't using any accessors other than plain get/set, which are semantically equal to setting/getting a volatile field, so I'm not sure how it's clearer.as of benefits of a volatile field, the code is shorter (and arguable cleaner) and you save some heap space. anyhow, I don't insist on usage of volatile boolean over AtomicBoolean, >> >>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >> won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? > > Unfortunately no. The CountDownLatch would be a nice solution but it is possible to get OOME in gc/lock (might be other) tests. I replaced Wicked by the same reason. Updating the AtomicInteger doesn't allocate any memory and don't cause OOME. I see. > > Leonid > >> >> I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. >> >> -- Igor >> >>> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik wrote: >>> >>> Hi >>> >>> Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. >>> >>> The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. >>> >>> >>> Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. >>> >>> ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. >>> >>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >>> >>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >>> >>> >>> Leonid >>> From igor.ignatyev at oracle.com Wed Mar 18 21:30:41 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 18 Mar 2020 14:30:41 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> Message-ID: <4E0F364A-47F3-428D-9C08-6B1ADFCB9D24@oracle.com> > I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. ok, now when I believe that I have enough understanding of Wicket, I have a few comments: 1. > 68 private Lock lock = new ReentrantLock(); > 69 private Condition condition = lock.newCondition(); it's better to make these fields final. 2. as all writes and reads of Wicket::count are guarded by lock.lock, there is no need for it to be atomic. 3. adding lock to getWaiters will also remove need for Wicket::waiters to be atomic. the rest looks good to me. Thanks, -- Igor > On Mar 18, 2020, at 12:48 PM, Igor Ignatyev wrote: > > Hi Leonid, > > I've started looking at your webrev, and so far have a couple questions: > >> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) > can't you use just a volatile boolean field? > >> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. > won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? > > I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. > > -- Igor > >> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik wrote: >> >> Hi >> >> Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. >> >> The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. >> >> >> Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. >> >> ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. >> >> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >> >> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >> >> >> Leonid >> > From leonid.mesnik at oracle.com Wed Mar 18 22:18:43 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 18 Mar 2020 15:18:43 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: <4E0F364A-47F3-428D-9C08-6B1ADFCB9D24@oracle.com> References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> <4E0F364A-47F3-428D-9C08-6B1ADFCB9D24@oracle.com> Message-ID: <504b0902-9fd1-ea8c-399a-185a4ceaa9e0@oracle.com> On 3/18/20 2:30 PM, Igor Ignatyev wrote: >> I need more time to get grasp of Wicket and your changes in it; will >> come back to you after I understand them. > ok, now when I believe that I have enough understanding of Wicket, I > have a few comments: > 1. >> 68 private Lock lock = new ReentrantLock(); >> 69 private Condition condition = lock.newCondition(); > it's better to make these fields final. > > 2. as all writes and reads of Wicket::count are guarded by lock.lock, > there is no need for it to be atomic. > 3. adding lock to?getWaiters will also remove need for Wicket::waiters > to be atomic. All 3 are fixed. Thanks for your suggestions. Updated version: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/ Leonid > > the rest looks good to me. > > Thanks, > -- Igor > > > >> On Mar 18, 2020, at 12:48 PM, Igor Ignatyev > > wrote: >> >> Hi Leonid, >> >> I've started looking at your webrev, and so far have a couple questions: >> >>> Test >>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java >>> was updated to don't use Wicket. (The lock has a reference to thread >>> which affects test.) >> can't you use just a volatile boolean field? >> >>> Wicket "finished" in class ThreadsRunner was changed to >>> atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which >>> might happened in stress GC tests. >> won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? >> >> I need more time to get grasp of Wicket and your changes in it; will >> come back to you after I understand them. >> >> -- Igor >> >>> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik >>> > wrote: >>> >>> Hi >>> >>> Could you please review following fix which slightly refactor >>> vmTestbase stress test harness. This refactoring helps to add >>> virtual threads testing support. >>> >>> The Wicket uses plain sync/wait/notify mechanism which cause carrier >>> thread starvation and should not be used in virtual threads. The >>> ManagedThread is a subclass of Thread so it couldn't be virtual thread. >>> >>> >>> Following fix changes Wicket to use locks/conditions to don't pin >>> vthread to carrier thread while starting testing. >>> >>> ManagedThread is fixed to keep execution thread as the thread >>> variable and isolate it's creation. >>> >>> Test >>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java >>> was updated to don't use Wicket. (The lock has a reference to thread >>> which affects test.) >>> >>> Wicket "finished" in class ThreadsRunner was changed to >>> atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which >>> might happened in stress GC tests. >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >>> >>> >>> Leonid >>> >> > From igor.ignatyev at oracle.com Wed Mar 18 22:22:56 2020 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Wed, 18 Mar 2020 15:22:56 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: <504b0902-9fd1-ea8c-399a-185a4ceaa9e0@oracle.com> References: <504b0902-9fd1-ea8c-399a-185a4ceaa9e0@oracle.com> Message-ID: Reviewed. ? Igor > On Mar 18, 2020, at 3:18 PM, Leonid Mesnik wrote: > > ? > > > On 3/18/20 2:30 PM, Igor Ignatyev wrote: >>> I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. >> ok, now when I believe that I have enough understanding of Wicket, I have a few comments: >> 1. >>> 68 private Lock lock = new ReentrantLock(); >>> 69 private Condition condition = lock.newCondition(); >> it's better to make these fields final. >> >> 2. as all writes and reads of Wicket::count are guarded by lock.lock, there is no need for it to be atomic. >> 3. adding lock to getWaiters will also remove need for Wicket::waiters to be atomic. > All 3 are fixed. Thanks for your suggestions. > > Updated version: > > http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/ > > Leonid > >> >> the rest looks good to me. >> >> Thanks, >> -- Igor >> >> >> >>> On Mar 18, 2020, at 12:48 PM, Igor Ignatyev wrote: >>> >>> Hi Leonid, >>> >>> I've started looking at your webrev, and so far have a couple questions: >>> >>>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >>> can't you use just a volatile boolean field? >>> >>>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >>> won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? >>> >>> I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. >>> >>> -- Igor >>> >>>> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik wrote: >>>> >>>> Hi >>>> >>>> Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. >>>> >>>> The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. >>>> >>>> >>>> Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. >>>> >>>> ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. >>>> >>>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >>>> >>>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >>>> >>>> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >>>> >>>> >>>> Leonid >>>> >>> >> From leonid.mesnik at oracle.com Wed Mar 18 22:51:15 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 18 Mar 2020 15:51:15 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: References: <504b0902-9fd1-ea8c-399a-185a4ceaa9e0@oracle.com> Message-ID: Thank you for review and? feedback. Leonid On 3/18/20 3:22 PM, Igor Ignatev wrote: > Reviewed. > > ? Igor > >> On Mar 18, 2020, at 3:18 PM, Leonid Mesnik >> wrote: >> >> ? >> >> >> On 3/18/20 2:30 PM, Igor Ignatyev wrote: >>>> I need more time to get grasp of Wicket and your changes in it; >>>> will come back to you after I understand them. >>> ok, now when I believe that I have enough understanding of Wicket, I >>> have a few comments: >>> 1. >>>> 68 private Lock lock = new ReentrantLock(); >>>> 69 private Condition condition = lock.newCondition(); >>> it's better to make these fields final. >>> >>> 2. as all writes and reads of Wicket::count are guarded by >>> lock.lock, there is no need for it to be atomic. >>> 3. adding lock to?getWaiters will also remove need for >>> Wicket::waiters to be atomic. >> >> All 3 are fixed. Thanks for your suggestions. >> >> Updated version: >> >> http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/ >> >> Leonid >> >>> >>> the rest looks good to me. >>> >>> Thanks, >>> -- Igor >>> >>> >>> >>>> On Mar 18, 2020, at 12:48 PM, Igor Ignatyev >>>> > wrote: >>>> >>>> Hi Leonid, >>>> >>>> I've started looking at your webrev, and so far have a couple >>>> questions: >>>> >>>>> Test >>>>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java >>>>> was updated to don't use Wicket. (The lock has a reference to >>>>> thread which affects test.) >>>> can't you use just a volatile boolean field? >>>> >>>>> Wicket "finished" in class ThreadsRunner was changed to >>>>> atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which >>>>> might happened in stress GC tests. >>>> won't j.u.c.CountDownLatch be more appropriate and cleaner solution >>>> here? >>>> >>>> I need more time to get grasp of Wicket and your changes in it; >>>> will come back to you after I understand them. >>>> >>>> -- Igor >>>> >>>>> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik >>>>> > wrote: >>>>> >>>>> Hi >>>>> >>>>> Could you please review following fix which slightly refactor >>>>> vmTestbase stress test harness. This refactoring helps to add >>>>> virtual threads testing support. >>>>> >>>>> The Wicket uses plain sync/wait/notify mechanism which cause >>>>> carrier thread starvation and should not be used in virtual >>>>> threads. The ManagedThread is a subclass of Thread so it couldn't >>>>> be virtual thread. >>>>> >>>>> >>>>> Following fix changes Wicket to use locks/conditions to don't pin >>>>> vthread to carrier thread while starting testing. >>>>> >>>>> ManagedThread is fixed to keep execution thread as the thread >>>>> variable and isolate it's creation. >>>>> >>>>> Test >>>>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java >>>>> was updated to don't use Wicket. (The lock has a reference to >>>>> thread which affects test.) >>>>> >>>>> Wicket "finished" in class ThreadsRunner was changed to >>>>> atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which >>>>> might happened in stress GC tests. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >>>>> >>>>> >>>>> Leonid >>>>> >>>> >>> From kim.barrett at oracle.com Thu Mar 19 02:28:51 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 18 Mar 2020 22:28:51 -0400 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: <5135da92-2302-a793-15d7-9db4070104c3@oracle.com> References: <5135da92-2302-a793-15d7-9db4070104c3@oracle.com> Message-ID: > On Mar 18, 2020, at 7:05 AM, Thomas Schatzl wrote: >>> On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: >>> >>> Please review this change to the handling of completed buffers by mutator >>> threads. [?] >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8139652 >>> >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ >>> >>> Testing >>> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. >> The original webrev was based on JDK-8239825 and JDK-8240133. The >> push and backout of JDK-8240133 has made that webrev no longer apply >> cleanly. So here's a new, up to date (as of this morning) webrev: >> https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ >> Tested with mach5 tier1-5 along with change for JDK-8239825 (which >> hasn't been pushed yet). > > - g1DirtyCardQueue.cpp:544: indentation of "fully_processed" parameter > > - I suggest to undo that line break in the assert 547 - the resulting string is like 83 chars. > > Looks good otherwise. I do not need a re-review for above tiny changes. Thanks. I?ve made those changes. New webrev (which no longer needs to be applied on top of separate change for JDK-8239825, which has been pushed). https://cr.openjdk.java.net/~kbarrett/8139652/open.02/ I didn?t bother with an incremental, since the changes are so minor. From kim.barrett at oracle.com Thu Mar 19 04:41:32 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 Mar 2020 00:41:32 -0400 Subject: RFR: 8241001: Improve logging in the ConcurrentGCBreakpoint mechanism Message-ID: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> Please review this change to the logging output produced by the ConcurrentGCBreakpoint facility. Most of its log messages provide low-level information for someone debugging the breakpoint mechanism or a test using it, with relatively cryptic messages unless that is what one is doing. As such, putting these messages at the debug level generates too much uninformative clutter for the fairly common case of gc*=debug logging; those messages have been demoted to the trace level. There were logging messages for the concurrent cycle start (idle->active) and end (active->idle) transitions, which were duplicative of other logging messages from the normal concurrent cycle operation. Those logging messages have been removed. However, the message for the active->idle transition when there is an active run-to request has been retained, but the message text has been improved. gc+breakpoint logging has also been configured to produce the normal "GC(n)" prefix, for those log messages produced by the collector. User requests such as run-to don't (and can't) have that prefix, as they don't occur within the context of the collection but instead come from Java threads outside the collector. CR: https://bugs.openjdk.java.net/browse/JDK-8241001 Webrev: https://cr.openjdk.java.net/~kbarrett/8241001/open.00/ Testing: Locally (linux-x64) ran TestConcurrentGCBreakpoints.java with various logging options and verified the results were as described. From per.liden at oracle.com Thu Mar 19 07:53:24 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 19 Mar 2020 08:53:24 +0100 Subject: RFR: 8241001: Improve logging in the ConcurrentGCBreakpoint mechanism In-Reply-To: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> References: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> Message-ID: <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> On 3/19/20 5:41 AM, Kim Barrett wrote: > Please review this change to the logging output produced by the > ConcurrentGCBreakpoint facility. > > Most of its log messages provide low-level information for someone > debugging the breakpoint mechanism or a test using it, with relatively > cryptic messages unless that is what one is doing. As such, putting > these messages at the debug level generates too much uninformative > clutter for the fairly common case of gc*=debug logging; those > messages have been demoted to the trace level. > > There were logging messages for the concurrent cycle start > (idle->active) and end (active->idle) transitions, which were > duplicative of other logging messages from the normal concurrent cycle > operation. Those logging messages have been removed. However, the > message for the active->idle transition when there is an active run-to > request has been retained, but the message text has been improved. > > gc+breakpoint logging has also been configured to produce the normal > "GC(n)" prefix, for those log messages produced by the collector. > User requests such as run-to don't (and can't) have that prefix, as > they don't occur within the context of the collection but instead come > from Java threads outside the collector. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8241001 > > Webrev: > https://cr.openjdk.java.net/~kbarrett/8241001/open.00/ Looks good! /Per > > Testing: > Locally (linux-x64) ran TestConcurrentGCBreakpoints.java with various > logging options and verified the results were as described. > > From stefan.johansson at oracle.com Thu Mar 19 08:00:21 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 19 Mar 2020 09:00:21 +0100 Subject: RFR: 8241001: Improve logging in the ConcurrentGCBreakpoint mechanism In-Reply-To: <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> References: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> Message-ID: <75C71436-9AB8-4FCA-B40C-D9638AFB0C75@oracle.com> > 19 mars 2020 kl. 08:53 skrev Per Liden : > > On 3/19/20 5:41 AM, Kim Barrett wrote: >> Please review this change to the logging output produced by the >> ConcurrentGCBreakpoint facility. >> Most of its log messages provide low-level information for someone >> debugging the breakpoint mechanism or a test using it, with relatively >> cryptic messages unless that is what one is doing. As such, putting >> these messages at the debug level generates too much uninformative >> clutter for the fairly common case of gc*=debug logging; those >> messages have been demoted to the trace level. >> There were logging messages for the concurrent cycle start >> (idle->active) and end (active->idle) transitions, which were >> duplicative of other logging messages from the normal concurrent cycle >> operation. Those logging messages have been removed. However, the >> message for the active->idle transition when there is an active run-to >> request has been retained, but the message text has been improved. >> gc+breakpoint logging has also been configured to produce the normal >> "GC(n)" prefix, for those log messages produced by the collector. >> User requests such as run-to don't (and can't) have that prefix, as >> they don't occur within the context of the collection but instead come >> from Java threads outside the collector. >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8241001 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8241001/open.00/ > > Looks good! Looks good to me too, Stefan > > /Per > >> Testing: >> Locally (linux-x64) ran TestConcurrentGCBreakpoints.java with various >> logging options and verified the results were as described. From stefan.karlsson at oracle.com Thu Mar 19 09:44:39 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 19 Mar 2020 10:44:39 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events Message-ID: Hi all, Please review this patch to rewrite the GCTimer, and associated classes, to not allow nested phases of different types (pause or concurrent). https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8241160 A bug was found when I was looking at JFR events from ZGC. A GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The only valid parent is a GCPhasePause event. The reason why this happened was that the we use a GCTraceTime class inside the class unloading code. Previously, we only used GCTraceTimes inside pauses, but ever since class unloading was moved out to a concurrent phase, this isn't true anymore. GCTraceTime used GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and therefore always reported pauses and pause sub-phases. With this patch, I suggest that we become stricter in our usages of the GCTimer. The effects of the patch are: 1) When a top-level pause (or concurrent) phase is created, the code must be explicit about what type of phase is created. The code will now assert if this is abused. Most places were already explicit, but I had to change two places: a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have access to register_gc_pause_start. I made that function public, instead of protected, so that we didn't have to deal with that problem. b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in VM_G1Concurrent). This is the only place that uses GCTraceTime to start a pause. All other places use GCTraceTime to create sub-phases. I could have copy-n-pasted the entire GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and create a version that calls register_gc_pause_start instead of register_gc_phase_start. Instead of doing that I opted for creating a system where the code code register a set of callbacks to be called when the start and end time is registered. This is used in the backend of GCTraceTime, but then also used by G1 to allow us to not have to copy-n-paste a lot of the code. I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic to the default callbacks (unfied logging and GCTimer) but couldn't find a nice way to express that, because of the way we macro-expand the UL tags. Maybe something we can consider for a future investigation. 2) sub-phases now inherit the type from the parent phase, and there's no possibility to incorrectly nest phases anymore. This also removed the need for ConcurrentGCTimer::_is_concurrent_phase_active. 3) This allows (and encourages concurrent sub-phases). When the JFR events were ported to HotSpot, only pauses got sub-phases, because there wasn't a big need for concurrent sub-phases. In this patch I added level of sub-phases to JFR. Maybe it would be better to add more right away? (I'm not a fan of having the explicit sub-phase level events, instead of a counter in *the* phase event, but the JMC team at that time needed it to be logged as separate events. Maybe something that could be reconsidered some time) 4) The different consumers of the timestamps are separated into their own classes. 5) Shenandoah devs need to consider what to do about this change: - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); + // FIXME: This turns off the previously broken JFR events. If we want to keep reporting them, + // but with the correct type (Concurrent) then a top-level concurrent phase is required. + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer */); Where this code caused GCPhasePauseLevel1 events for ZGC, this used to create GCPhasePause events for Shenandoah. It uses GCTraceTime to log sub-phases, but the current Shenandoah code hasn't registered a top-level phase at this point. Either we keep this code with the removal of the gc_timer argument, or we add a top-level phase somewhere. If we want the latter, then I need suggestions on where to add them. Or maybe push the current code, and fix it as a follow-up patch? What do you think? An alternative is to (continue?) completely forbid concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes during concurrent phases. Even if we decide to do that, I think there's some merit to the stricter GCTimer code, and the slight separation of concern in GCTraceTime. Tested tier1-3 Thanks, StefanK From erik.osterlund at oracle.com Thu Mar 19 11:37:00 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 19 Mar 2020 12:37:00 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: References: Message-ID: <1f6f55ee-5285-3dab-2938-47ae081b0a53@oracle.com> Hi Stefan, Nice! I like how you can now catch incorrect use of the API, without making it hard to use. Looks good. Thanks for sorting this out. /Erik On 2020-03-19 10:44, Stefan Karlsson wrote: > Hi all, > > Please review this patch to rewrite the GCTimer, and associated > classes, to not allow nested phases of different types (pause or > concurrent). > > https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241160 > > A bug was found when I was looking at JFR events from ZGC. A > GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The > only valid parent is a GCPhasePause event. The reason why this > happened was that the we use a GCTraceTime class inside the class > unloading code. Previously, we only used GCTraceTimes inside pauses, > but ever since class unloading was moved out to a concurrent phase, > this isn't true anymore. GCTraceTime used > GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and > therefore always reported pauses and pause sub-phases. > > With this patch, I suggest that we become stricter in our usages of > the GCTimer. The effects of the patch are: > > 1) When a top-level pause (or concurrent) phase is created, the code > must be explicit about what type of phase is created. The code will > now assert if this is abused. Most places were already explicit, but I > had to change two places: > > a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have > access to register_gc_pause_start. I made that function public, > instead of protected, so that we didn't have to deal with that problem. > > b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in > VM_G1Concurrent). This is the only place that uses GCTraceTime to > start a pause. All other places use GCTraceTime to create sub-phases. > I could have copy-n-pasted the entire > GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and > create a version that calls register_gc_pause_start instead of > register_gc_phase_start. Instead of doing that I opted for creating a > system where the code code register a set of callbacks to be called > when the start and end time is registered. This is used in the backend > of GCTraceTime, but then also used by G1 to allow us to not have to > copy-n-paste a lot of the code. > > I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic > to the default callbacks (unfied logging and GCTimer) but couldn't > find a nice way to express that, because of the way we macro-expand > the UL tags. Maybe something we can consider for a future investigation. > > 2) sub-phases now inherit the type from the parent phase, and there's > no possibility to incorrectly nest phases anymore. This also removed > the need for ConcurrentGCTimer::_is_concurrent_phase_active. > > 3) This allows (and encourages concurrent sub-phases). When the JFR > events were ported to HotSpot, only pauses got sub-phases, because > there wasn't a big need for concurrent sub-phases. In this patch I > added level of sub-phases to JFR. Maybe it would be better to add more > right away? (I'm not a fan of having the explicit sub-phase level > events, instead of a counter in *the* phase event, but the JMC team at > that time needed it to be logged as separate events. Maybe something > that could be reconsidered some time) > > 4) The different consumers of the timestamps are separated into their > own classes. > > 5) Shenandoah devs need to consider what to do about this change: > > - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); > + // FIXME: This turns off the previously broken JFR events. If we > want to keep reporting them, > + // but with the correct type (Concurrent) then a top-level > concurrent phase is required. > + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer > */); > > Where this code caused GCPhasePauseLevel1 events for ZGC, this used to > create GCPhasePause events for Shenandoah. It uses GCTraceTime to log > sub-phases, but the current Shenandoah code hasn't registered a > top-level phase at this point. Either we keep this code with the > removal of the gc_timer argument, or we add a top-level phase > somewhere. If we want the latter, then I need suggestions on where to > add them. Or maybe push the current code, and fix it as a follow-up > patch? > > What do you think? An alternative is to (continue?) completely forbid > concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes > during concurrent phases. Even if we decide to do that, I think > there's some merit to the stricter GCTimer code, and the slight > separation of concern in GCTraceTime. > > Tested tier1-3 > > Thanks, > StefanK From shade at redhat.com Thu Mar 19 12:21:36 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 19 Mar 2020 13:21:36 +0100 Subject: RFR (S) 8241139: Shenandoah: distribute mark-compact work exactly to minimize fragmentation Message-ID: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241139 Was following up on why JLinkTest fails with Shenandoah. Figured out the dynamic work distribution in mark-compact leaves alive regions in the middle of the heap. It is a generic problem with current mark-compact implementation, as which regions get into each worker slice is time-dependent. Consider the worst case scenario: two workers would have their slices interleaved, once slice is fully alive, and other is fully dead. In the end, mark-compact would finish with the same interleaved heap. A humongous allocation then fails. We need to plan the parallel sliding more accurately. See the code comments about what new plan does. Webrev: https://cr.openjdk.java.net/~shade/8241139/webrev.01/ Testing: hotspot_gc_shenandoah; known-failing test; tier{1,2,3} (passed with previous version, running with new version now); eyeballing shenandoah-visualizer -- Thanks, -Aleksey From thomas.schatzl at oracle.com Thu Mar 19 14:17:12 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 19 Mar 2020 15:17:12 +0100 Subject: RFR (S): 8240590: Add MemRegion::destroy_array to complement introduced create_array In-Reply-To: <14723860-4954-4CF5-A370-90839F4181F7@oracle.com> References: <8fcf72be-0696-88fc-3f02-aca75b2c96e2@oracle.com> <14723860-4954-4CF5-A370-90839F4181F7@oracle.com> Message-ID: Hi, thanks Stefan and Leo for the reviews. Thomas On 18.03.20 14:59, Stefan Johansson wrote: > Hi Thomas, > >> 18 mars 2020 kl. 12:18 skrev Thomas Schatzl : >> >> Hi all, >> >> can I have reviews for this small change that introduces a MemRegion::destroy_array() method to complement the recently introduced MemRegion::create_array(). >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8240590 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8240590/webrev/ > Looks good, > Stefan > >> Testing: >> hs-tier1-5 >> >> Thanks, >> Thomas > From poonam.bajaj at oracle.com Thu Mar 19 14:36:49 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Thu, 19 Mar 2020 07:36:49 -0700 Subject: RFR: 8231779: crash HeapWord*ParallelScavengeHeap::failed_mem_allocate Message-ID: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> Hello, Please review this simple change that avoids a double to float conversion that would fix an intermittent crash seen on Solaris boxes due to corrupted float values in the Floating Point registers. http://cr.openjdk.java.net/~poonam/8231779/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8231779 Thanks, Poonam From stefan.johansson at oracle.com Thu Mar 19 14:42:38 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 19 Mar 2020 15:42:38 +0100 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: References: <5135da92-2302-a793-15d7-9db4070104c3@oracle.com> Message-ID: <298261eb-a2de-2fde-8d26-96dd255b3da4@oracle.com> Hi Kim, On 2020-03-19 03:28, Kim Barrett wrote: >> On Mar 18, 2020, at 7:05 AM, Thomas Schatzl wrote: >>>> On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: >>>> >>>> Please review this change to the handling of completed buffers by mutator >>>> threads. [?] >>>> >>>> CR: >>>> https://bugs.openjdk.java.net/browse/JDK-8139652 >>>> >>>> Webrev: >>>> https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ >>>> >>>> Testing >>>> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. >>> The original webrev was based on JDK-8239825 and JDK-8240133. The >>> push and backout of JDK-8240133 has made that webrev no longer apply >>> cleanly. So here's a new, up to date (as of this morning) webrev: >>> https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ >>> Tested with mach5 tier1-5 along with change for JDK-8239825 (which >>> hasn't been pushed yet). >> - g1DirtyCardQueue.cpp:544: indentation of "fully_processed" parameter >> >> - I suggest to undo that line break in the assert 547 - the resulting string is like 83 chars. >> >> Looks good otherwise. I do not need a re-review for above tiny changes. > Thanks. I?ve made those changes. > > New webrev (which no longer needs to be applied on top of separate change for JDK-8239825, which has been pushed). > > https://cr.openjdk.java.net/~kbarrett/8139652/open.02/ Looks good, Stefan > > I didn?t bother with an incremental, since the changes are so minor. > From rkennke at redhat.com Thu Mar 19 14:54:45 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 19 Mar 2020 15:54:45 +0100 Subject: RFR (S) 8241139: Shenandoah: distribute mark-compact work exactly to minimize fragmentation In-Reply-To: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> References: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> Message-ID: <4d87f4ec-8872-6c27-86a6-7867485d7955@redhat.com> Very nice! The changes look good to me. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241139 > > Was following up on why JLinkTest fails with Shenandoah. Figured out the dynamic work distribution > in mark-compact leaves alive regions in the middle of the heap. It is a generic problem with current > mark-compact implementation, as which regions get into each worker slice is time-dependent. > > Consider the worst case scenario: two workers would have their slices interleaved, once slice is > fully alive, and other is fully dead. In the end, mark-compact would finish with the same > interleaved heap. A humongous allocation then fails. We need to plan the parallel sliding more > accurately. See the code comments about what new plan does. > > Webrev: > https://cr.openjdk.java.net/~shade/8241139/webrev.01/ > > Testing: hotspot_gc_shenandoah; known-failing test; tier{1,2,3} (passed with previous version, > running with new version now); eyeballing shenandoah-visualizer > From aph at redhat.com Thu Mar 19 14:58:19 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 19 Mar 2020 14:58:19 +0000 Subject: RFR: 8241296: Segfault in JNIHandleBlock::oops_do() Message-ID: We're seeing intermittent SEGVs in JDKs with some newer GCC versions and combinations of options. It turns out that it's a pretty trivial error which has never been noticed before. Thread::oops_do() does this: void Thread::oops_do(OopClosure* f, CodeBlobClosure* cf) { active_handles()->oops_do(f); However, there is a window while a Thread is being constructed when active_handles() is NULL. GC can occur during this time period, and it's a matter of luck that we haven't seen this crash before. http://cr.openjdk.java.net/~aph/8241296/ OK to push? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From stefan.johansson at oracle.com Thu Mar 19 15:17:09 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 19 Mar 2020 16:17:09 +0100 Subject: RFR: 8241141: Restructure humongous object allocation in G1 Message-ID: <261897b8-077c-f4b2-69be-95ce20144abc@oracle.com> Hi, Please review this refactoring of the humongous allocation code in G1. Issue: https://bugs.openjdk.java.net/browse/JDK-8241141 Webrev: http://cr.openjdk.java.net/~sjohanss/8241141/00/ Summary The allocation code for humongous objects is scattered between G1CollectedHeap and the HeapRegionManager. This change moves the allocating regions part into the manager, while heap is still responsible for initializing the object. The code that finds contiguous regions in the heap has also been refactored a bit to simplify the logic. Now we make sure that we only search among available regions when not wanting to expand the heap. Testing Tier 1-5 in Mach5 and local stress testing. Thanks, Stefan From stefan.karlsson at oracle.com Thu Mar 19 15:22:18 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 19 Mar 2020 16:22:18 +0100 Subject: RFR: 8241296: Segfault in JNIHandleBlock::oops_do() In-Reply-To: References: Message-ID: <9cefe7b7-1d86-b62f-7352-148204abdc0e@oracle.com> Hi Andrew, I think the fix is fine. However, it's also seems to go against some other parts of the code, that tries to setup threads and add them to the thread lists *after* the handles have been created: attach_current_thread: // This thread will not do a safepoint check, since it has // not been added to the Thread list yet. { MutexLocker ml(Threads_lock); // This must be inside this lock in order to get FullGCALot to work properly, i.e., to // avoid this thread trying to do a GC before it is added to the thread-list thread->set_active_handles(JNIHandleBlock::allocate_block()); Threads::add(thread, daemon); } Or without a safepoint between the setting of the _active_handles and the adding of the thread to the thread list when going through the normal pre_run/run setup. Or did I miss a safepoint somewhere? I do see this code in JavaThread::exit: if (active_handles() != NULL) { JNIHandleBlock* block = active_handles(); set_active_handles(NULL); JNIHandleBlock::release_block(block); } ... if (JvmtiEnv::environments_might_exist()) { JvmtiExport::cleanup_thread(this); } ... Threads::remove(this, daemon); where cleanup_threads take a lock *with* a safepoint check, allowing GCs to run and exposing a NULL _active_handle. Would you mind sharing some extra info? For example the stack trace of the scanned thread, and / or flags used to provoke this? I would like to know why we haven't seen this before. Thanks, StefanK On 2020-03-19 15:58, Andrew Haley wrote: > We're seeing intermittent SEGVs in JDKs with some newer GCC versions > and combinations of options. It turns out that it's a pretty trivial > error which has never been noticed before. > > Thread::oops_do() does this: > > void Thread::oops_do(OopClosure* f, CodeBlobClosure* cf) { > active_handles()->oops_do(f); > > However, there is a window while a Thread is being constructed when > active_handles() is NULL. GC can occur during this time period, and > it's a matter of luck that we haven't seen this crash before. > > http://cr.openjdk.java.net/~aph/8241296/ > > OK to push? > From tprintezis at twitter.com Thu Mar 19 16:25:29 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Thu, 19 Mar 2020 09:25:29 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: Hi Stefan, Twitter, of course, has NO idle periods. ;-) Would there be any objection to extending the G1PeriodicGCInterval mechanism to ignore young GCs? I?ll be happy to work on it. In our experience there are a couple of reasons to force cycles at a given frequency. One (less common for us) is to clear up the StringTable, as I?ve described already. Another (a lot more common for us) is to force finalizers / cleaners / and friends to run to clean up native resources held by dead objects in the old generation. We have had several cases where services would run out of native memory because cycles would happen very infrequently and a lot of native memory was held by dead objects in the old generation. FWIW, this was our original motivation for introducing the mechanism we have to force CMS cycles at a given frequency. Tony ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com On March 18, 2020 at 3:55:14 AM, Stefan Johansson ( stefan.johansson at oracle.com) wrote: Hi Tony, Yes, if a young GC happens during the interval it will prevent a periodic concurrent start GC from happening. I see how this might not play well with your use case, unless your service sometimes have an idle period. Stefan > 17 mars 2020 kl. 21:23 skrev Tony Printezis : > > Hi Stefan, > > Thanks for mentioning the flag. I?ll take a look. I don?t think we?ll be on 12 or later any time soon, but I can always backport the change if needed. Quick question: ?the periodic request will be skipped if any GC occurred in the interval?: So, if a young GC happens during the interval, a periodic concurrent cycle won?t? > > Tony > > > ????? > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com > > > On March 17, 2020 at 5:31:24 AM, Stefan Johansson ( stefan.johansson at oracle.com) wrote: > >> Hi Tony, >> >> > 16 mars 2020 kl. 22:02 skrev Tony Printezis : >> > >> > ... >> > >> > FWIW, we use mostly CMS in 8 and we addressed this by forcing more frequent >> > CMS cycles (we have a flag that starts a CMS cycle every N secs). This >> > helps to keep young GC times mostly in check for our services that suffer >> > from this issue. >> > >> If you want to try a similar workaround for G1. In JDK 12, G1PeriodicGCInterval was added as part of JEP 346: Promptly Return Unused Committed Memory from G1. This flag allows you to trigger GCs based on that interval, and by default the GC triggered start a concurrent cycle. This feature was mostly designed for idle applications and the periodic request will be skipped if any GC occurred in the interval, so depending on your workload disabling adaptive IHOP and setting a lower fixed IHOP might be better approach. >> >> Thanks, >> Stefan >> >> > ... >> > ????? >> > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From aph at redhat.com Thu Mar 19 16:47:07 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 19 Mar 2020 16:47:07 +0000 Subject: RFR: 8241296: Segfault in JNIHandleBlock::oops_do() In-Reply-To: <9cefe7b7-1d86-b62f-7352-148204abdc0e@oracle.com> References: <9cefe7b7-1d86-b62f-7352-148204abdc0e@oracle.com> Message-ID: <1031247c-8623-a288-05da-8950a31b970e@redhat.com> Hi, On 3/19/20 3:22 PM, Stefan Karlsson wrote: > I think the fix is fine. OK, thanks. > Would you mind sharing some extra info? For example the stack trace > of the scanned thread, and / or flags used to provoke this? I would > like to know why we haven't seen this before. Sure. #0 0x00007ffff7dafb02 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib #1 0x00007ffff77533fb in os::PlatformEvent::park (this=0x7ffff0ab690 #2 0x00007ffff7706805 in ParkCommon (timo=0, ev=0x7ffff0ab6900) #3 Monitor::ILock (this=this at entry=0x7ffff0005b30, Self=Self at entry=0 #4 0x00007ffff7706ffa in Monitor::lock_without_safepoint_check (Self #5 Monitor::lock_without_safepoint_check (this=0x7ffff0005b30) #6 0x00007ffff77e7f71 in SafepointSynchronize::block (thread=0x7ffff #7 0x00007ffff77e6afa in SafepointSynchronize::block (thread=thread@ #8 0x00007ffff78fd897 in ThreadStateTransition::transition_and_fence #9 JavaThread::run (this=0x7ffff0ab5800) #10 0x00007ffff7747d78 in java_start (thread=0x7ffff0ab5800) #11 0x00007ffff7da9472 in start_thread () from /lib64/libpthread.so.0 #12 0x00007ffff7ee5063 in clone () from /lib64/libc.so.6 The thread blocked in transition_and_fence() here: note this is in JDK 8, but it hasn't changed AFAICS: // The first routine called by a new Java thread void JavaThread::run() { // initialize thread-local alloc buffer related fields this->initialize_tlab(); // used to test validitity of stack trace backs this->record_base_of_stack_pointer(); // Record real stack base and size. this->record_stack_base_and_size(); // Initialize thread local storage; set before calling MutexLocker this->initialize_thread_local_storage(); this->create_stack_guard_pages(); this->cache_global_variables(); // Thread is now sufficient initialized to be handled by the safepoint code as being // in the VM. Change thread state from _thread_new to _thread_in_vm =>ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm); assert(JavaThread::current() == this, "sanity check"); assert(!Thread::current()->owns_locks(), "sanity check"); DTRACE_THREAD_PROBE(start, this); // This operation might block. We call that after all safepoint checks for a new thread has // been completed. this->set_active_handles(JNIHandleBlock::allocate_block()); So it's pretty obvious why active_handles wasn't set yet. This code isn't obviously different from that in jdk/jdk, but I have not been able to reproduce the bug there. IMO, though, it's still a bug in jdk/jdk. The most likely reason we haven't seen this before is that JNIHandleBlock::oops_do() looks like this: void JNIHandleBlock::oops_do(OopClosure* f) { JNIHandleBlock* current_chain = this; while (current_chain != NULL) { ... } A sufficiently adversarial compiler can turn this into void JNIHandleBlock::oops_do(OopClosure* f) { JNIHandleBlock* current_chain = this; do { ... } while (current_chain != NULL) because "this" can never be null in a member function. GCC sometimes does this transformation. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Thu Mar 19 16:54:23 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 19 Mar 2020 17:54:23 +0100 Subject: RFR: 8241296: Segfault in JNIHandleBlock::oops_do() In-Reply-To: References: Message-ID: On 3/19/20 3:58 PM, Andrew Haley wrote: > http://cr.openjdk.java.net/~aph/8241296/ Looks good. -- Thanks, -Aleksey From kim.barrett at oracle.com Thu Mar 19 20:28:35 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 Mar 2020 16:28:35 -0400 Subject: RFR: 8241001: Improve logging in the ConcurrentGCBreakpoint mechanism In-Reply-To: <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> References: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> Message-ID: <6F18DEEB-35C1-42B5-BAE2-484F63C59187@oracle.com> > On Mar 19, 2020, at 3:53 AM, Per Liden wrote: > > On 3/19/20 5:41 AM, Kim Barrett wrote: >> Please review this change to the logging output produced by the >> ConcurrentGCBreakpoint facility. >> [?]CR: >> https://bugs.openjdk.java.net/browse/JDK-8241001 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8241001/open.00/ > > Looks good! Thanks. From kim.barrett at oracle.com Thu Mar 19 20:29:05 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 Mar 2020 16:29:05 -0400 Subject: RFR: 8241001: Improve logging in the ConcurrentGCBreakpoint mechanism In-Reply-To: <75C71436-9AB8-4FCA-B40C-D9638AFB0C75@oracle.com> References: <379DA3E8-F79A-4332-8579-83D7AB09ACA1@oracle.com> <65b834d2-b715-0639-14d3-94d15801d13a@oracle.com> <75C71436-9AB8-4FCA-B40C-D9638AFB0C75@oracle.com> Message-ID: <4658EA65-6DFA-4DC6-A43F-E5C1166E8DC0@oracle.com> > On Mar 19, 2020, at 4:00 AM, Stefan Johansson wrote: >> On 3/19/20 5:41 AM, Kim Barrett wrote: >>> Please review this change to the logging output produced by the >>> ConcurrentGCBreakpoint facility. >>> [?] >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8241001 >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8241001/open.00/ >> >> Looks good! > Looks good to me too, > Stefan Thanks. From kim.barrett at oracle.com Thu Mar 19 20:29:52 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 Mar 2020 16:29:52 -0400 Subject: RFR: 8139652: Mutator refinement processing should take the oldest dirty card buffer In-Reply-To: <298261eb-a2de-2fde-8d26-96dd255b3da4@oracle.com> References: <5135da92-2302-a793-15d7-9db4070104c3@oracle.com> <298261eb-a2de-2fde-8d26-96dd255b3da4@oracle.com> Message-ID: > On Mar 19, 2020, at 10:42 AM, Stefan Johansson wrote: > > Hi Kim, > > On 2020-03-19 03:28, Kim Barrett wrote: >>> On Mar 18, 2020, at 7:05 AM, Thomas Schatzl wrote: >>>>> On Mar 3, 2020, at 9:32 PM, Kim Barrett wrote: >>>>> >>>>> Please review this change to the handling of completed buffers by mutator >>>>> threads. [?] >>>>> >>>>> CR: >>>>> https://bugs.openjdk.java.net/browse/JDK-8139652 >>>>> >>>>> Webrev: >>>>> https://cr.openjdk.java.net/~kbarrett/8139652/open.00/ >>>>> >>>>> Testing >>>>> mach5 tier1-5 along with changes for JDK-8239825 and JDK-8139652. >>>> The original webrev was based on JDK-8239825 and JDK-8240133. The >>>> push and backout of JDK-8240133 has made that webrev no longer apply >>>> cleanly. So here's a new, up to date (as of this morning) webrev: >>>> https://cr.openjdk.java.net/~kbarrett/8139652/open.01/ >>>> Tested with mach5 tier1-5 along with change for JDK-8239825 (which >>>> hasn't been pushed yet). >>> - g1DirtyCardQueue.cpp:544: indentation of "fully_processed" parameter >>> >>> - I suggest to undo that line break in the assert 547 - the resulting string is like 83 chars. >>> >>> Looks good otherwise. I do not need a re-review for above tiny changes. >> Thanks. I?ve made those changes. >> >> New webrev (which no longer needs to be applied on top of separate change for JDK-8239825, which has been pushed). >> >> https://cr.openjdk.java.net/~kbarrett/8139652/open.02/ > Looks good, > Stefan Thanks. From dean.long at oracle.com Thu Mar 19 22:26:28 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 19 Mar 2020 15:26:28 -0700 Subject: RFR: 8231779: crash HeapWord*ParallelScavengeHeap::failed_mem_allocate In-Reply-To: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> References: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> Message-ID: Do you need a comment nearby so that nobody accidentally undoes this fix in the future? dl On 3/19/20 7:36 AM, Poonam Parhar wrote: > Hello, > > Please review this simple change that avoids a double to float > conversion that would fix an intermittent crash seen on Solaris boxes > due to corrupted float values in the Floating Point registers. > > http://cr.openjdk.java.net/~poonam/8231779/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8231779 > > Thanks, > Poonam From poonam.bajaj at oracle.com Fri Mar 20 00:04:53 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Thu, 19 Mar 2020 17:04:53 -0700 Subject: RFR: 8231779: crash HeapWord*ParallelScavengeHeap::failed_mem_allocate In-Reply-To: References: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> Message-ID: <4e08f2f5-8f74-f3ff-b307-db6dea74cc98@oracle.com> Hello Dean, On 3/19/20 3:26 PM, Dean Long wrote: > Do you need a comment nearby so that nobody accidentally undoes this > fix in the future? Yes, of course; good idea! I will add a comment in the code. regards, Poonam > > dl > > On 3/19/20 7:36 AM, Poonam Parhar wrote: >> Hello, >> >> Please review this simple change that avoids a double to float >> conversion that would fix an intermittent crash seen on Solaris boxes >> due to corrupted float values in the Floating Point registers. >> >> http://cr.openjdk.java.net/~poonam/8231779/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8231779 >> >> Thanks, >> Poonam > From leonid.mesnik at oracle.com Fri Mar 20 01:10:28 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 19 Mar 2020 18:10:28 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible In-Reply-To: <240d6e82-d229-8a2d-6be2-3042a1537c11@oracle.com> References: <0d73d306-2eff-375c-65e1-67142b2c6c59@oracle.com> <4E0F364A-47F3-428D-9C08-6B1ADFCB9D24@oracle.com> <504b0902-9fd1-ea8c-399a-185a4ceaa9e0@oracle.com> <240d6e82-d229-8a2d-6be2-3042a1537c11@oracle.com> Message-ID: Hi Thank you for review and feedback. See my comments inline. > On Mar 19, 2020, at 6:03 PM, serguei.spitsyn at oracle.com wrote: > > Hi Leonid, > > It looks good in general. > Just a couple of comments. > > > http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/share/Wicket.java.frames.html > 168 public int waitFor(long timeout) { > 169 if (timeout < 0) > 170 throw new IllegalArgumentException( > 171 "timeout value is negative: " + timeout); > 172 > 173 long id = System.currentTimeMillis(); > 174 > 175 try { > 176 lock.lock(); > 177 --waiters; > 178 if (debugOutput != null) { > 179 debugOutput.printf("Wicket %d %s: waitFor(). There are %d waiters totally now.\n", id, name, waiters); > 180 } > 181 > 182 long waitTime = timeout; > 183 long startTime = System.currentTimeMillis(); > 184 > 185 while (count > 0 && waitTime > 0) { > 186 try { > 187 condition.await(waitTime, TimeUnit.MILLISECONDS); > 188 } catch (InterruptedException e) { > 189 } > 190 waitTime = timeout - (System.currentTimeMillis() - startTime); > 191 } > 192 --waiters; > 193 return count; > 194 } finally { > 195 lock.unlock(); > 196 } > 197 } > > The waiters probably needs to be incremented instead of decremented at line: > 177 --waiters; Thank you, fixed. > > http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/share/runner/ThreadsRunner.java.udiff.html > private void waitForOtherThreads() { > if (shouldWait) { > shouldWait = false; > - finished.unlock(); > - finished.waitFor(); > + finished.decrementAndGet(); > + while (finished.get() != 0) { > + try { > + Thread.sleep(1000); > + } catch (InterruptedException ie) { > + } > + } > } else { > throw new TestBug("Waiting a second time is not premitted"); > } > } > > Should we use a shorter sleep, something like Thread.sleep(100)? > These tests executed 30 or 60 seconds now by default, so sleeping 1 sec doesn't increase overall time. But tI am fine to change it 100, it also should works fine. Leonid > > Thanks, > Serguei > > > On 3/18/20 15:18, Leonid Mesnik wrote: >> >> On 3/18/20 2:30 PM, Igor Ignatyev wrote: >>>> I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. >>> ok, now when I believe that I have enough understanding of Wicket, I have a few comments: >>> 1. >>>> 68 private Lock lock = new ReentrantLock(); >>>> 69 private Condition condition = lock.newCondition(); >>> it's better to make these fields final. >>> >>> 2. as all writes and reads of Wicket::count are guarded by lock.lock, there is no need for it to be atomic. >>> 3. adding lock to getWaiters will also remove need for Wicket::waiters to be atomic. >> All 3 are fixed. Thanks for your suggestions. >> >> Updated version: >> >> http://cr.openjdk.java.net/~lmesnik/8241123/webrev.01/ >> Leonid >> >>> >>> the rest looks good to me. >>> >>> Thanks, >>> -- Igor >>> >>> >>> >>>> On Mar 18, 2020, at 12:48 PM, Igor Ignatyev > wrote: >>>> >>>> Hi Leonid, >>>> >>>> I've started looking at your webrev, and so far have a couple questions: >>>> >>>>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >>>> can't you use just a volatile boolean field? >>>> >>>>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >>>> won't j.u.c.CountDownLatch be more appropriate and cleaner solution here? >>>> >>>> I need more time to get grasp of Wicket and your changes in it; will come back to you after I understand them. >>>> >>>> -- Igor >>>> >>>>> On Mar 18, 2020, at 12:37 PM, Leonid Mesnik > wrote: >>>>> >>>>> Hi >>>>> >>>>> Could you please review following fix which slightly refactor vmTestbase stress test harness. This refactoring helps to add virtual threads testing support. >>>>> >>>>> The Wicket uses plain sync/wait/notify mechanism which cause carrier thread starvation and should not be used in virtual threads. The ManagedThread is a subclass of Thread so it couldn't be virtual thread. >>>>> >>>>> >>>>> Following fix changes Wicket to use locks/conditions to don't pin vthread to carrier thread while starting testing. >>>>> >>>>> ManagedThread is fixed to keep execution thread as the thread variable and isolate it's creation. >>>>> >>>>> Test vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects003/referringObjects003a.java was updated to don't use Wicket. (The lock has a reference to thread which affects test.) >>>>> >>>>> Wicket "finished" in class ThreadsRunner was changed to atomicInt/sleep to avoid OOME in j.u.c.l.Condition::await() which might happened in stress GC tests. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~lmesnik/8241123/webrev.00/ >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8241123 >>>>> >>>>> >>>>> Leonid >>>>> >>>> >>> > From kim.barrett at oracle.com Fri Mar 20 04:04:26 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 20 Mar 2020 00:04:26 -0400 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: Message-ID: <95E126B6-6742-4F91-BC99-BF88F90B7AB4@oracle.com> > On Mar 19, 2020, at 12:25 PM, Tony Printezis wrote: > > Hi Stefan, > > Twitter, of course, has NO idle periods. ;-) > > Would there be any objection to extending the G1PeriodicGCInterval > mechanism to ignore young GCs? I?ll be happy to work on it. Wow, that's a lot of strings. As you say, the existing idle mechanism isn't the right approach, since you aren't actually idle. But I don't think modifying that mechanism to ignore young collections is right either. An application that really lives in the young-gen and isn't accumulating old objects, but wants to take advantage of going idle, would get unnecessary concurrent cycles inflicted on it. > In our > experience there are a couple of reasons to force cycles at a given > frequency. One (less common for us) is to clear up the StringTable, as I?ve > described already. Another (a lot more common for us) is to force > finalizers / cleaners / and friends to run to clean up native resources > held by dead objects in the old generation. We have had several cases where > services would run out of native memory because cycles would happen very > infrequently and a lot of native memory was held by dead objects in the old > generation. FWIW, this was our original motivation for introducing the > mechanism we have to force CMS cycles at a given frequency. The application could run with -XX:+ExplicitGCInvokesConcurrent and just call System.gc when it decides that's appropriate. That requires the additional CLA that might have unintended effect on other uses of System.gc though. I keep circling back to wanting an alternative to System.gc that will attempt a concurrent collection if that's available. The application can call that when it thinks it's appropriate. I'm not fond of the idea of splitting the StringTable into young and old parts. That seems like a relatively big hammer for what seems like a pretty specialized problem. From thomas.schatzl at oracle.com Fri Mar 20 08:32:05 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Mar 2020 09:32:05 +0100 Subject: RFR (M): 8238855: Move G1ConcurrentMark flag sanity checks to g1Arguments Message-ID: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> Hi all, can I have reviews for this change that moves (and deletes duplicate) flag checking from the G1ConcurrentMark class to the other G1 arguments processing? Adds a test that checks whether the invariants before/after are still kept. CR: https://bugs.openjdk.java.net/browse/JDK-8238855 Webrev: http://cr.openjdk.java.net/~tschatzl/8238855/webrev/ Testing: hs-tier1-5 with new test Thanks, Thomas From stefan.karlsson at oracle.com Fri Mar 20 08:35:01 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 20 Mar 2020 09:35:01 +0100 Subject: RFR: 8241296: Segfault in JNIHandleBlock::oops_do() In-Reply-To: <1031247c-8623-a288-05da-8950a31b970e@redhat.com> References: <9cefe7b7-1d86-b62f-7352-148204abdc0e@oracle.com> <1031247c-8623-a288-05da-8950a31b970e@redhat.com> Message-ID: <287f47c2-70f1-0987-72bf-6efce716f9f9@oracle.com> Hi Andrew, Thanks for clarifying where and why this failed! StefanK On 2020-03-19 17:47, Andrew Haley wrote: > Hi, > > On 3/19/20 3:22 PM, Stefan Karlsson wrote: > >> I think the fix is fine. > OK, thanks. > > > Would you mind sharing some extra info? For example the stack trace >> of the scanned thread, and / or flags used to provoke this? I would >> like to know why we haven't seen this before. > Sure. > > #0 0x00007ffff7dafb02 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib > #1 0x00007ffff77533fb in os::PlatformEvent::park (this=0x7ffff0ab690 > #2 0x00007ffff7706805 in ParkCommon (timo=0, ev=0x7ffff0ab6900) > #3 Monitor::ILock (this=this at entry=0x7ffff0005b30, Self=Self at entry=0 > #4 0x00007ffff7706ffa in Monitor::lock_without_safepoint_check (Self > #5 Monitor::lock_without_safepoint_check (this=0x7ffff0005b30) > #6 0x00007ffff77e7f71 in SafepointSynchronize::block (thread=0x7ffff > #7 0x00007ffff77e6afa in SafepointSynchronize::block (thread=thread@ > #8 0x00007ffff78fd897 in ThreadStateTransition::transition_and_fence > #9 JavaThread::run (this=0x7ffff0ab5800) > #10 0x00007ffff7747d78 in java_start (thread=0x7ffff0ab5800) > #11 0x00007ffff7da9472 in start_thread () from /lib64/libpthread.so.0 > #12 0x00007ffff7ee5063 in clone () from /lib64/libc.so.6 > > The thread blocked in transition_and_fence() here: note this is in JDK > 8, but it hasn't changed AFAICS: > > // The first routine called by a new Java thread > void JavaThread::run() { > // initialize thread-local alloc buffer related fields > this->initialize_tlab(); > > // used to test validitity of stack trace backs > this->record_base_of_stack_pointer(); > > // Record real stack base and size. > this->record_stack_base_and_size(); > > // Initialize thread local storage; set before calling MutexLocker > this->initialize_thread_local_storage(); > > this->create_stack_guard_pages(); > > this->cache_global_variables(); > > // Thread is now sufficient initialized to be handled by the safepoint code as being > // in the VM. Change thread state from _thread_new to _thread_in_vm > =>ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm); > > assert(JavaThread::current() == this, "sanity check"); > assert(!Thread::current()->owns_locks(), "sanity check"); > > DTRACE_THREAD_PROBE(start, this); > > // This operation might block. We call that after all safepoint checks for a new thread has > // been completed. > this->set_active_handles(JNIHandleBlock::allocate_block()); > > So it's pretty obvious why active_handles wasn't set yet. This code > isn't obviously different from that in jdk/jdk, but I have not been > able to reproduce the bug there. IMO, though, it's still a bug in > jdk/jdk. > > The most likely reason we haven't seen this before is that > JNIHandleBlock::oops_do() looks like this: > > void JNIHandleBlock::oops_do(OopClosure* f) { > JNIHandleBlock* current_chain = this; > while (current_chain != NULL) { > ... > } > > A sufficiently adversarial compiler can turn this into > > void JNIHandleBlock::oops_do(OopClosure* f) { > JNIHandleBlock* current_chain = this; > do { > ... > } while (current_chain != NULL) > > because "this" can never be null in a member function. GCC sometimes > does this transformation. > From thomas.schatzl at oracle.com Fri Mar 20 08:35:04 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Mar 2020 08:35:04 +0000 (UTC) Subject: RFR: 8231779: crash HeapWord*ParallelScavengeHeap::failed_mem_allocate In-Reply-To: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> References: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> Message-ID: <69850b39-3a5c-8d22-9edf-7ea07b1a6f46@oracle.com> Hi, On 19.03.20 15:36, Poonam Parhar wrote: > Hello, > > Please review this simple change that avoids a double to float > conversion that would fix an intermittent crash seen on Solaris boxes > due to corrupted float values in the Floating Point registers. > > http://cr.openjdk.java.net/~poonam/8231779/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8231779 > looks good. As Dean mentioned, adding a comment here might improve the change (the CR contains enough info). No need for a re-review for a comment. Thanks, Thomas From jiefu at tencent.com Fri Mar 20 12:15:04 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Fri, 20 Mar 2020 12:15:04 +0000 Subject: RFR: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted Message-ID: <72433FF1-774C-4C1D-A5F0-7967CA0D1CCD@tencent.com> Hi all, JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ A VM fatal error may be observed if ZGC is used. The background is that some of our products will run in the docker. For some safety reason, SYS_get_mempolicy is not allowed in the docker. It might be not a good practice to generate a fatal error when get_mempolicy fails. What do you think? Thanks a lot. Best regards, Jie From stefan.karlsson at oracle.com Fri Mar 20 13:43:13 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 20 Mar 2020 14:43:13 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events Message-ID: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> Hi all, Please review this patch to add some memory related JFR events to ZGC. https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8241361 Added events: ZAllocationStall - Record when we run out of heap memory and the Java threads stall, waiting for the GC to free up memory. ZPageAllocation - Updated the existing event to also record the duration of the event. Updated the event to only be reported if the allocation takes longer than 1 ms. ZPageCacheFlush - Record when the page cache needs to be flushed. This usually happens when we run out of a specific page size and have to detach the physical and virtual memory to materialize a new ZPage. We also flush pages when we uncommit memory. ZRelocationSet - Record information about the selected relocation set. ZUncommit - Record when we uncommit and hand back memory to the OS. The patch also contains some small cosmetic changes to existing events, whitespace fixes. From tprintezis at twitter.com Fri Mar 20 14:54:12 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Fri, 20 Mar 2020 07:54:12 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: <95E126B6-6742-4F91-BC99-BF88F90B7AB4@oracle.com> References: <95E126B6-6742-4F91-BC99-BF88F90B7AB4@oracle.com> Message-ID: Hi Kim, Please see inline. ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com On March 20, 2020 at 12:06:36 AM, Kim Barrett (kim.barrett at oracle.com) wrote: > On Mar 19, 2020, at 12:25 PM, Tony Printezis wrote: > > Hi Stefan, > > Twitter, of course, has NO idle periods. ;-) > > Would there be any objection to extending the G1PeriodicGCInterval > mechanism to ignore young GCs? I?ll be happy to work on it. Wow, that's a lot of strings. Yes. As you say, the existing idle mechanism isn't the right approach, since you aren't actually idle. But I don't think modifying that mechanism to ignore young collections is right either. An application that really lives in the young-gen and isn't accumulating old objects, but wants to take advantage of going idle, would get unnecessary concurrent cycles inflicted on it. I think I was a bit unclear. I was suggesting to add a new flag to be able to do either. Either a boolean flag to ignore young GCs or have two different interval flags (G1PeriodicGCInterval and G1PeriodConcGCInterval). > In our > experience there are a couple of reasons to force cycles at a given > frequency. One (less common for us) is to clear up the StringTable, as I?ve > described already. Another (a lot more common for us) is to force > finalizers / cleaners / and friends to run to clean up native resources > held by dead objects in the old generation. We have had several cases where > services would run out of native memory because cycles would happen very > infrequently and a lot of native memory was held by dead objects in the old > generation. FWIW, this was our original motivation for introducing the > mechanism we have to force CMS cycles at a given frequency. The application could run with -XX:+ExplicitGCInvokesConcurrent and just call System.gc when it decides that's appropriate. Of course. We can also provide our own API to start a concurrent cycle without setting +ExplicitGCInvokesConcurrent. But it?s always much easier to add a flag that does what we want instead of having to make code changes. That requires the additional CLA that might have unintended effect on other uses of System.gc though. I keep circling back to wanting an alternative to System.gc that will attempt a concurrent collection if that's available. The application can call that when it thinks it's appropriate. I'm not fond of the idea of splitting the StringTable into young and old parts. That seems like a relatively big hammer for what seems like a pretty specialized problem. So, playing Devil?s advocate, why was it done for nmethods? Tony From tprintezis at twitter.com Fri Mar 20 15:07:19 2020 From: tprintezis at twitter.com (Tony Printezis) Date: Fri, 20 Mar 2020 08:07:19 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: <95E126B6-6742-4F91-BC99-BF88F90B7AB4@oracle.com> Message-ID: On March 20, 2020 at 10:54:12 AM, Tony Printezis (tprintezis at twitter.com) wrote: That requires the additional CLA that might have unintended effect on other uses of System.gc though. I keep circling back to wanting an alternative to System.gc that will attempt a concurrent collection if that's available. The application can call that when it thinks it's appropriate. I'm not fond of the idea of splitting the StringTable into young and old parts. That seems like a relatively big hammer for what seems like a pretty specialized problem. So, playing Devil?s advocate, why was it done for nmethods? Actually, one more thing: The fact that we are causing concurrent cycles every 6 hours is definitely diminishing the problem. However, note that Young GC times between cycles go up by between 3x and 6x depending on load. They drop after each cycle but they keep going up until the next cycle. If we could split the StringTable and avoid scanning the old part, young GCs would now stay flat. The problem would be that, if we did cycles every 3 to 4 days, scanning the StringTable during the remark pause would be super slow, as it will have grown very large (could be seconds just to scan the StringTable). So, a combination of a split StringTable + frequent cycles will probably yield the best of both worlds. Anyway, I thought I?d point that out. Tony ????? Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From erik.osterlund at oracle.com Fri Mar 20 15:13:06 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 20 Mar 2020 16:13:06 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> Message-ID: <691895c0-3d92-b89f-5313-a4ff15f2fc7e@oracle.com> Hi Stefan, Very nice. I like the new syntax for filling in event data and committing. Looks good. Thanks, /Erik On 2020-03-20 14:43, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add some memory related JFR events to ZGC. > > https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241361 > > Added events: > > ZAllocationStall - Record when we run out of heap memory and the Java > threads stall, waiting for the GC to free up memory. > > ZPageAllocation - Updated the existing event to also record the > duration of the event. Updated the event to only be reported if the > allocation takes longer than 1 ms. > > ZPageCacheFlush - Record when the page cache needs to be flushed. This > usually happens when we run out of a specific page size and have to > detach the physical and virtual memory to materialize a new ZPage. We > also flush pages when we uncommit memory. > > ZRelocationSet - Record information about the selected relocation set. > > ZUncommit - Record when we uncommit and hand back memory to the OS. > > The patch also contains some small cosmetic changes to existing > events, whitespace fixes. From kim.barrett at oracle.com Fri Mar 20 21:46:45 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 20 Mar 2020 17:46:45 -0400 Subject: RFR (M): 8238855: Move G1ConcurrentMark flag sanity checks to g1Arguments In-Reply-To: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> References: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> Message-ID: <55CB6F16-9BC8-4AA4-B8E2-AD8A5D00F023@oracle.com> > On Mar 20, 2020, at 4:32 AM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this change that moves (and deletes duplicate) flag checking from the G1ConcurrentMark class to the other G1 arguments processing? > > Adds a test that checks whether the invariants before/after are still kept. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238855 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238855/webrev/ > Testing: > hs-tier1-5 with new test > > Thanks, > Thomas Looks good. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Arguments.cpp 107 void G1Arguments::initialize_mark_stack_size() { Any particular reason for splitting this out into a separate function that has one caller? All the preceeding cases in the same caller are just directly inlined, so this looks kind of out of place. I don't object to this, but might actually prefer there to be more like it. That can wait for a followup. ------------------------------------------------------------------------------ From fw at deneb.enyo.de Sat Mar 21 12:50:47 2020 From: fw at deneb.enyo.de (Florian Weimer) Date: Sat, 21 Mar 2020 13:50:47 +0100 Subject: RFR: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted In-Reply-To: <72433FF1-774C-4C1D-A5F0-7967CA0D1CCD@tencent.com> (=?utf-8?Q?=22jiefu=28=E5=82=85=E6=9D=B0=29=22's?= message of "Fri, 20 Mar 2020 12:15:04 +0000") References: <72433FF1-774C-4C1D-A5F0-7967CA0D1CCD@tencent.com> Message-ID: <87pnd598ig.fsf@mid.deneb.enyo.de> * jiefu(??): > Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > A VM fatal error may be observed if ZGC is used. Is warning() printed to standard output or standard error? > The background is that some of our products will run in the docker. > For some safety reason, SYS_get_mempolicy is not allowed in the docker. Various container runtimes randomly disable system calls they deem unworthy. It's a common problem for the first widespread use of any system call. Some of the runtimes also do not simply return ENOSYS errors (which the kernel would use to indicate that the system call is not supported). From jiefu at tencent.com Sat Mar 21 13:35:28 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Sat, 21 Mar 2020 13:35:28 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) Message-ID: <466BAACF-F9D6-4314-BDE0-6E9DA66C566B@tencent.com> Hi Florian, ?On 2020/3/21, 8:52 PM, "Florian Weimer" wrote: > A VM fatal error may be observed if ZGC is used. Is warning() printed to standard output or standard error? It will print to the standard error. > The background is that some of our products will run in the docker. > For some safety reason, SYS_get_mempolicy is not allowed in the docker. Various container runtimes randomly disable system calls they deem unworthy. It's a common problem for the first widespread use of any system call. Some of the runtimes also do not simply return ENOSYS errors (which the kernel would use to indicate that the system call is not supported). According to the docker docs[1], for safety reasons, the get_mempolicy is blocked by default, not randomly disabled. And for that reason, our customers refused to support it in their runtime env. Thanks. Best regards, Jie [1] https://docs.docker.com/engine/security/seccomp/ From fw at deneb.enyo.de Sat Mar 21 13:37:11 2020 From: fw at deneb.enyo.de (Florian Weimer) Date: Sat, 21 Mar 2020 14:37:11 +0100 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <466BAACF-F9D6-4314-BDE0-6E9DA66C566B@tencent.com> (=?utf-8?Q?=22jiefu=28=E5=82=85=E6=9D=B0=29=22's?= message of "Sat, 21 Mar 2020 13:35:28 +0000") References: <466BAACF-F9D6-4314-BDE0-6E9DA66C566B@tencent.com> Message-ID: <87lfnt96d4.fsf@mid.deneb.enyo.de> * jiefu(??): > According to the docker docs[1], for safety reasons, the > get_mempolicy is blocked by default, not randomly disabled. And for > that reason, our customers refused to support it in their runtime > env. What I meant was that the set of enabled system calls is determined without careful analysis, not that the set of system calls changes with each deployment/container launch. From jiefu at tencent.com Sat Mar 21 13:48:33 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Sat, 21 Mar 2020 13:48:33 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <87lfnt96d4.fsf@mid.deneb.enyo.de> References: <466BAACF-F9D6-4314-BDE0-6E9DA66C566B@tencent.com> <87lfnt96d4.fsf@mid.deneb.enyo.de> Message-ID: On 2020/3/21, 9:38 PM, "Florian Weimer" wrote: > According to the docker docs[1], for safety reasons, the > get_mempolicy is blocked by default, not randomly disabled. And for > that reason, our customers refused to support it in their runtime > env. What I meant was that the set of enabled system calls is determined without careful analysis, not that the set of system calls changes with each deployment/container launch. I agree with you. Thanks for your clarification. Best regards, Jie From erik.osterlund at oracle.com Sun Mar 22 08:25:23 2020 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Sun, 22 Mar 2020 09:25:23 +0100 Subject: RFR: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted In-Reply-To: <72433FF1-774C-4C1D-A5F0-7967CA0D1CCD@tencent.com> References: <72433FF1-774C-4C1D-A5F0-7967CA0D1CCD@tencent.com> Message-ID: Hi Jie, It seems to me that if the environment doesn?t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line. Thanks, /Erik > On 20 Mar 2020, at 13:15, jiefu(??) wrote: > > ?Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > A VM fatal error may be observed if ZGC is used. > > The background is that some of our products will run in the docker. > For some safety reason, SYS_get_mempolicy is not allowed in the docker. > > It might be not a good practice to generate a fatal error when get_mempolicy fails. > What do you think? > > Thanks a lot. > Best regards, > Jie From jiefu at tencent.com Sun Mar 22 13:35:53 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Sun, 22 Mar 2020 13:35:53 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) Message-ID: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> Hi Erik, Thanks for your review and valuable comments. Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ Please review it. Thanks a lot. Best regards, Jie ?On 2020/3/22, 4:26 PM, "Erik ?sterlund" wrote: Hi Jie, It seems to me that if the environment doesn?t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line. Thanks, /Erik > On 20 Mar 2020, at 13:15, jiefu(??) wrote: > > Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > A VM fatal error may be observed if ZGC is used. > > The background is that some of our products will run in the docker. > For some safety reason, SYS_get_mempolicy is not allowed in the docker. > > It might be not a good practice to generate a fatal error when get_mempolicy fails. > What do you think? > > Thanks a lot. > Best regards, > Jie From thomas.schatzl at oracle.com Sun Mar 22 15:38:14 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sun, 22 Mar 2020 16:38:14 +0100 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> Message-ID: <4bf86c475fcb87594340d0f0ee11a89fa4db8cf6.camel@oracle.com> Hi, On Sun, 2020-03-22 at 13:35 +0000, jiefu(??) wrote: > Hi Erik, > > Thanks for your review and valuable comments. > > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ > > Please review it. do you happen to know if other collectors are also somehow affected by this restriction in a docker container? It would be unfortunate if this were only fixed for one collector. Thanks, Thomas From stefan.karlsson at oracle.com Mon Mar 23 07:38:06 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 23 Mar 2020 08:38:06 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <691895c0-3d92-b89f-5313-a4ff15f2fc7e@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <691895c0-3d92-b89f-5313-a4ff15f2fc7e@oracle.com> Message-ID: <41085b94-3ea8-35c1-aa77-b4715e950c04@oracle.com> Thanks for reviewing! StefanK On 2020-03-20 16:13, Erik ?sterlund wrote: > Hi Stefan, > > Very nice. I like the new syntax for filling in event data and > committing. > Looks good. > > Thanks, > /Erik > > On 2020-03-20 14:43, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add some memory related JFR events to ZGC. >> >> https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241361 >> >> Added events: >> >> ZAllocationStall - Record when we run out of heap memory and the Java >> threads stall, waiting for the GC to free up memory. >> >> ZPageAllocation - Updated the existing event to also record the >> duration of the event. Updated the event to only be reported if the >> allocation takes longer than 1 ms. >> >> ZPageCacheFlush - Record when the page cache needs to be flushed. >> This usually happens when we run out of a specific page size and have >> to detach the physical and virtual memory to materialize a new ZPage. >> We also flush pages when we uncommit memory. >> >> ZRelocationSet - Record information about the selected relocation set. >> >> ZUncommit - Record when we uncommit and hand back memory to the OS. >> >> The patch also contains some small cosmetic changes to existing >> events, whitespace fixes. > From stefan.karlsson at oracle.com Mon Mar 23 07:39:04 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 23 Mar 2020 08:39:04 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: <1f6f55ee-5285-3dab-2938-47ae081b0a53@oracle.com> References: <1f6f55ee-5285-3dab-2938-47ae081b0a53@oracle.com> Message-ID: <72584db7-ca93-c5cf-1823-a3208e79537e@oracle.com> Thanks for reviewing! StefanK On 2020-03-19 12:37, Erik ?sterlund wrote: > Hi Stefan, > > Nice! I like how you can now catch incorrect use of the API, without > making it hard to use. > Looks good. Thanks for sorting this out. > > /Erik > > On 2020-03-19 10:44, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to rewrite the GCTimer, and associated >> classes, to not allow nested phases of different types (pause or >> concurrent). >> >> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241160 >> >> A bug was found when I was looking at JFR events from ZGC. A >> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The >> only valid parent is a GCPhasePause event. The reason why this >> happened was that the we use a GCTraceTime class inside the class >> unloading code. Previously, we only used GCTraceTimes inside pauses, >> but ever since class unloading was moved out to a concurrent phase, >> this isn't true anymore. GCTraceTime used >> GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and >> therefore always reported pauses and pause sub-phases. >> >> With this patch, I suggest that we become stricter in our usages of >> the GCTimer. The effects of the patch are: >> >> 1) When a top-level pause (or concurrent) phase is created, the code >> must be explicit about what type of phase is created. The code will >> now assert if this is abused. Most places were already explicit, but >> I had to change two places: >> >> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have >> access to register_gc_pause_start. I made that function public, >> instead of protected, so that we didn't have to deal with that problem. >> >> b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in >> VM_G1Concurrent). This is the only place that uses GCTraceTime to >> start a pause. All other places use GCTraceTime to create sub-phases. >> I could have copy-n-pasted the entire >> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and >> create a version that calls register_gc_pause_start instead of >> register_gc_phase_start. Instead of doing that I opted for creating a >> system where the code code register a set of callbacks to be called >> when the start and end time is registered. This is used in the >> backend of GCTraceTime, but then also used by G1 to allow us to not >> have to copy-n-paste a lot of the code. >> >> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper >> agnostic to the default callbacks (unfied logging and GCTimer) but >> couldn't find a nice way to express that, because of the way we >> macro-expand the UL tags. Maybe something we can consider for a >> future investigation. >> >> 2) sub-phases now inherit the type from the parent phase, and there's >> no possibility to incorrectly nest phases anymore. This also removed >> the need for ConcurrentGCTimer::_is_concurrent_phase_active. >> >> 3) This allows (and encourages concurrent sub-phases). When the JFR >> events were ported to HotSpot, only pauses got sub-phases, because >> there wasn't a big need for concurrent sub-phases. In this patch I >> added level of sub-phases to JFR. Maybe it would be better to add >> more right away? (I'm not a fan of having the explicit sub-phase >> level events, instead of a counter in *the* phase event, but the JMC >> team at that time needed it to be logged as separate events. Maybe >> something that could be reconsidered some time) >> >> 4) The different consumers of the timestamps are separated into their >> own classes. >> >> 5) Shenandoah devs need to consider what to do about this change: >> >> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); >> + // FIXME: This turns off the previously broken JFR events. If we >> want to keep reporting them, >> + // but with the correct type (Concurrent) then a top-level >> concurrent phase is required. >> + unloading_occurred = SystemDictionary::do_unloading(NULL /* >> gc_timer */); >> >> Where this code caused GCPhasePauseLevel1 events for ZGC, this used >> to create GCPhasePause events for Shenandoah. It uses GCTraceTime to >> log sub-phases, but the current Shenandoah code hasn't registered a >> top-level phase at this point. Either we keep this code with the >> removal of the gc_timer argument, or we add a top-level phase >> somewhere. If we want the latter, then I need suggestions on where to >> add them. Or maybe push the current code, and fix it as a follow-up >> patch? >> >> What do you think? An alternative is to (continue?) completely forbid >> concurrent sub-phases, and remove the gc_timers passed to >> GCTraceTimes during concurrent phases. Even if we decide to do that, >> I think there's some merit to the stricter GCTimer code, and the >> slight separation of concern in GCTraceTime. >> >> Tested tier1-3 >> >> Thanks, >> StefanK > From jiefu at tencent.com Mon Mar 23 08:23:04 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Mon, 23 Mar 2020 08:23:04 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <4bf86c475fcb87594340d0f0ee11a89fa4db8cf6.camel@oracle.com> References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> <4bf86c475fcb87594340d0f0ee11a89fa4db8cf6.camel@oracle.com> Message-ID: <2C4E29A5-E4BF-4B92-8649-DD5C107A1BAC@tencent.com> Hi Thomas, Thanks for your review and valuable comments. Yes, we should fix it for other GCs too. Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ Please review it and give me some advice. Thanks a lot. Best regards, Jie ?On 2020/3/22, 11:39 PM, "Thomas Schatzl" wrote: Hi, On Sun, 2020-03-22 at 13:35 +0000, jiefu(??) wrote: > Hi Erik, > > Thanks for your review and valuable comments. > > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ > > Please review it. do you happen to know if other collectors are also somehow affected by this restriction in a docker container? It would be unfortunate if this were only fixed for one collector. Thanks, Thomas From stefan.karlsson at oracle.com Mon Mar 23 08:41:48 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 23 Mar 2020 09:41:48 +0100 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> Message-ID: Hi Jie, On 2020-03-22 14:35, jiefu(??) wrote: > Hi Erik, > > Thanks for your review and valuable comments. > > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ > > Please review it. Thanks for providing this patch. If it is only the get_mempolicy that is problematic, then I wonder if it would be better to leave the UseNUMA flag untouched and only turn off the ZGC specific NUMA parts. Maybe something like this: static bool check_get_mempolicy_support() { ? int dummy = 0; ? int mode = -1; ? // Check whether get_mempolicy is supported or not ? if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, MPOL_F_NODE | MPOL_F_ADDR) == -1) { ??? if (!FLAG_IS_DEFAULT(UseNUMA)) { ????? warning("ZGC NUMA support is disabled since get_mempolicy is unsupported."); ??? } ??? return false; ? } ? return true; } void ZNUMA::initialize_platform() { ? _enabled = UseNUMA && check_get_mempolicy_support(); } An alternative would be to take this a step further (probably as a separate RFR) and provide a user friendly output in our -Xlog:gc+init output: [0.015s][info][gc,init] Initializing The Z Garbage Collector [0.015s][info][gc,init] Version: 15-internal+0-2020-03-04-0947497.stefank... (fastdebug) [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE [0.015s][info][gc,init] CPUs: 32 total, 32 available [0.015s][info][gc,init] Memory: 128851M [0.015s][info][gc,init] Large Page Support: Disabled [0.015s][info][gc,init] Medium Page Size: 32M [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent Borrowing the structure from how UseLargePages are setup and printed: void ZLargePages::initialize_platform() { ? if (UseLargePages) { ??? if (UseTransparentHugePages) { ????? _state = Transparent; ??? } else { ????? _state = Explicit; ??? } ? } else { ??? _state = Disabled; ? } } const char* ZLargePages::to_string() { ? switch (_state) { ? case Explicit: ??? return "Enabled (Explicit)"; ? case Transparent: ??? return "Enabled (Transparent)"; ? default: ??? return "Disabled"; ? } } Thanks, StefanK > > Thanks a lot. > Best regards, > Jie > > ?On 2020/3/22, 4:26 PM, "Erik ?sterlund" wrote: > > Hi Jie, > > It seems to me that if the environment doesn?t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line. > > Thanks, > /Erik > > > On 20 Mar 2020, at 13:15, jiefu(??) wrote: > > > > Hi all, > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > > > A VM fatal error may be observed if ZGC is used. > > > > The background is that some of our products will run in the docker. > > For some safety reason, SYS_get_mempolicy is not allowed in the docker. > > > > It might be not a good practice to generate a fatal error when get_mempolicy fails. > > What do you think? > > > > Thanks a lot. > > Best regards, > > Jie > > > > From per.liden at oracle.com Mon Mar 23 08:53:23 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 23 Mar 2020 09:53:23 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> Message-ID: <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> Hi, On 3/20/20 2:43 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add some memory related JFR events to ZGC. > > https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241361 Nice! Looks good overall, a few minor things: src/hotspot/share/gc/z/zPageAllocator.cpp ----------------------------------------- * Instead of: #include "jfrfiles/jfrEventClasses.hpp" I think we usually do: #include "jfr/jfrEvents.hpp" * I could live without the whitespace you added on line 631. 630 ZPageCacheFlushForAllocationClosure cl(requested); 631 632 const size_t flushed = flush_cache(&cl, true /* for_allocation * I think cl->_flushed user here: 604 event.commit(cl->_requested, cl->_flushed, for_allocation); should instead just be: 604 event.commit(cl->_requested, flushed, for_allocation); Right? src/hotspot/share/gc/z/zPageCache.hpp ------------------------------------- Instead of: friend class ZPageAllocator; add a getter for requested()? src/hotspot/share/gc/z/zRelocationSetSelector.cpp ------------------------------------------------- * Same here, instead of: #include "jfrfiles/jfrEventClasses.hpp" I think we should do: #include "jfr/jfrEvents.hpp" * You don't think we should use ZPageTypeType that you introduced, and send three different ZRelocationSet events, one for each page type? Shouldn't this event also be timed, and sent from within ZRelocationSetSelectorGroup::select()? src/hotspot/share/gc/z/zTracer.cpp ---------------------------------- 43 writer.write("small"); 44 writer.write_key(ZPageTypeMedium); 45 writer.write("medium"); 46 writer.write_key(ZPageTypeLarge); 47 writer.write("large"); How about "Small", "Medium" and "Large"? I could only find one other place (in jfrStackTraceRepository.cpp) where names were given, and those start with a capital letter. cheers, Per > > Added events: > > ZAllocationStall - Record when we run out of heap memory and the Java > threads stall, waiting for the GC to free up memory. > > ZPageAllocation - Updated the existing event to also record the duration > of the event. Updated the event to only be reported if the allocation > takes longer than 1 ms. > > ZPageCacheFlush - Record when the page cache needs to be flushed. This > usually happens when we run out of a specific page size and have to > detach the physical and virtual memory to materialize a new ZPage. We > also flush pages when we uncommit memory. > > ZRelocationSet - Record information about the selected relocation set. > > ZUncommit - Record when we uncommit and hand back memory to the OS. > > The patch also contains some small cosmetic changes to existing events, > whitespace fixes. From per.liden at oracle.com Mon Mar 23 09:06:04 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 23 Mar 2020 10:06:04 +0100 Subject: RFR: 8231779: crash HeapWord*ParallelScavengeHeap::failed_mem_allocate In-Reply-To: <69850b39-3a5c-8d22-9edf-7ea07b1a6f46@oracle.com> References: <65a7e19d-f889-8ec6-e044-1ca30b871f56@oracle.com> <69850b39-3a5c-8d22-9edf-7ea07b1a6f46@oracle.com> Message-ID: <5796853e-7c5d-5a35-5055-036ccb879dbe@oracle.com> On 3/20/20 9:35 AM, Thomas Schatzl wrote: > Hi, > > On 19.03.20 15:36, Poonam Parhar wrote: > > Hello, > > > > Please review this simple change that avoids a double to float > > conversion that would fix an intermittent crash seen on Solaris boxes > > due to corrupted float values in the Floating Point registers. > > > > http://cr.openjdk.java.net/~poonam/8231779/webrev.00/ > > https://bugs.openjdk.java.net/browse/JDK-8231779 > > > > ? looks good. As Dean mentioned, adding a comment here might improve > the change (the CR contains enough info). No need for a re-review for a > comment. +1 cheers, Per From jiefu at tencent.com Mon Mar 23 09:06:05 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Mon, 23 Mar 2020 09:06:05 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> Message-ID: <4203A6EE-7589-4A40-9C85-CF9670928752@tencent.com> Hi StefanK, Thanks for your review and very nice suggestions. After more investigation, I found that several NUMA apis won't work in the docker, such as get_mempolicy, numa_tonode_memory, ... So it isn't only the get_mempolicy that is problematic. And Thomas had reminded me that the other gcs are affected by this issue too. So it would be better to fix them together. What do you think of http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ? Thanks a lot. Best regards, Jie ?On 2020/3/23, 4:43 PM, "Stefan Karlsson" wrote: Hi Jie, On 2020-03-22 14:35, jiefu(??) wrote: > Hi Erik, > > Thanks for your review and valuable comments. > > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ > > Please review it. Thanks for providing this patch. If it is only the get_mempolicy that is problematic, then I wonder if it would be better to leave the UseNUMA flag untouched and only turn off the ZGC specific NUMA parts. Maybe something like this: static bool check_get_mempolicy_support() { int dummy = 0; int mode = -1; // Check whether get_mempolicy is supported or not if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, MPOL_F_NODE | MPOL_F_ADDR) == -1) { if (!FLAG_IS_DEFAULT(UseNUMA)) { warning("ZGC NUMA support is disabled since get_mempolicy is unsupported."); } return false; } return true; } void ZNUMA::initialize_platform() { _enabled = UseNUMA && check_get_mempolicy_support(); } An alternative would be to take this a step further (probably as a separate RFR) and provide a user friendly output in our -Xlog:gc+init output: [0.015s][info][gc,init] Initializing The Z Garbage Collector [0.015s][info][gc,init] Version: 15-internal+0-2020-03-04-0947497.stefank... (fastdebug) [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE [0.015s][info][gc,init] CPUs: 32 total, 32 available [0.015s][info][gc,init] Memory: 128851M [0.015s][info][gc,init] Large Page Support: Disabled [0.015s][info][gc,init] Medium Page Size: 32M [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent Borrowing the structure from how UseLargePages are setup and printed: void ZLargePages::initialize_platform() { if (UseLargePages) { if (UseTransparentHugePages) { _state = Transparent; } else { _state = Explicit; } } else { _state = Disabled; } } const char* ZLargePages::to_string() { switch (_state) { case Explicit: return "Enabled (Explicit)"; case Transparent: return "Enabled (Transparent)"; default: return "Disabled"; } } Thanks, StefanK > > Thanks a lot. > Best regards, > Jie > > On 2020/3/22, 4:26 PM, "Erik ?sterlund" wrote: > > Hi Jie, > > It seems to me that if the environment doesn?t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line. > > Thanks, > /Erik > > > On 20 Mar 2020, at 13:15, jiefu(??) wrote: > > > > Hi all, > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > > > A VM fatal error may be observed if ZGC is used. > > > > The background is that some of our products will run in the docker. > > For some safety reason, SYS_get_mempolicy is not allowed in the docker. > > > > It might be not a good practice to generate a fatal error when get_mempolicy fails. > > What do you think? > > > > Thanks a lot. > > Best regards, > > Jie > > > > From stefan.karlsson at oracle.com Mon Mar 23 09:30:06 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 23 Mar 2020 10:30:06 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> Message-ID: <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> On 2020-03-23 09:53, Per Liden wrote: > Hi, > > On 3/20/20 2:43 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add some memory related JFR events to ZGC. >> >> https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241361 > > Nice! Looks good overall, a few minor things: > > > src/hotspot/share/gc/z/zPageAllocator.cpp > ----------------------------------------- > > * Instead of: > > ? #include "jfrfiles/jfrEventClasses.hpp" > > I think we usually do: > > ? #include "jfr/jfrEvents.hpp" OK > > > * I could live without the whitespace you added on line 631. > > ?630?? ZPageCacheFlushForAllocationClosure cl(requested); > ?631 > ?632?? const size_t flushed = flush_cache(&cl, true /* for_allocation Yes > > > * I think cl->_flushed user here: > > ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); > > should instead just be: > > ?604?? event.commit(cl->_requested, flushed, for_allocation); > > Right? I intentionally used cl->_flushed since that describes how much we flushed including overflushed parts of pages. Maybe we should report both values? Maybe also rename the local variable flushed to destroyed? > > > src/hotspot/share/gc/z/zPageCache.hpp > ------------------------------------- > > Instead of: > > ? friend class ZPageAllocator; > > add a getter for requested()? > I also want _flushed, depending on the resolution of the above. I don't think its bad to friend our closures that are pure extensions to the "owning" class. I don't have a very strong opinion here, but gravitated towards a friend declaration to minimize the exposure of the implementation details. If you still want me to add getters, I'll do it. > > src/hotspot/share/gc/z/zRelocationSetSelector.cpp > ------------------------------------------------- > > * Same here, instead of: > > ? #include "jfrfiles/jfrEventClasses.hpp" > > I think we should do: > > ? #include "jfr/jfrEvents.hpp" Yes > > > * You don't think we should use ZPageTypeType that you introduced, and > send three different ZRelocationSet events, one for each page type? > Shouldn't this event also be timed, and sent from within > ZRelocationSetSelectorGroup::select()? JMC is not always great at handling normalized events. If we want events per type I think we should add them in _addition_ to the event I added. > > > src/hotspot/share/gc/z/zTracer.cpp > ---------------------------------- > > ? 43???? writer.write("small"); > ? 44???? writer.write_key(ZPageTypeMedium); > ? 45???? writer.write("medium"); > ? 46???? writer.write_key(ZPageTypeLarge); > ? 47???? writer.write("large"); > > How about "Small", "Medium" and "Large"? I could only find one other > place (in jfrStackTraceRepository.cpp) where names were given, and > those start with a capital letter. OK Here's the updated webrevs with the easy fixes: ?https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ ?https://cr.openjdk.java.net/~stefank/8241361/webrev.02 Waiting for answers and comments to the rest. Thanks, StefanK > > cheers, > Per > > >> >> Added events: >> >> ZAllocationStall - Record when we run out of heap memory and the Java >> threads stall, waiting for the GC to free up memory. >> >> ZPageAllocation - Updated the existing event to also record the >> duration of the event. Updated the event to only be reported if the >> allocation takes longer than 1 ms. >> >> ZPageCacheFlush - Record when the page cache needs to be flushed. >> This usually happens when we run out of a specific page size and have >> to detach the physical and virtual memory to materialize a new ZPage. >> We also flush pages when we uncommit memory. >> >> ZRelocationSet - Record information about the selected relocation set. >> >> ZUncommit - Record when we uncommit and hand back memory to the OS. >> >> The patch also contains some small cosmetic changes to existing >> events, whitespace fixes. From stefan.karlsson at oracle.com Mon Mar 23 09:41:17 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 23 Mar 2020 10:41:17 +0100 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <4203A6EE-7589-4A40-9C85-CF9670928752@tencent.com> References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> <4203A6EE-7589-4A40-9C85-CF9670928752@tencent.com> Message-ID: On 2020-03-23 10:06, jiefu(??) wrote: > Hi StefanK, > > Thanks for your review and very nice suggestions. > > After more investigation, I found that several NUMA apis won't work in the docker, such as get_mempolicy, numa_tonode_memory, ... > So it isn't only the get_mempolicy that is problematic. > > And Thomas had reminded me that the other gcs are affected by this issue too. > So it would be better to fix them together. > > What do you think of http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ? numa_available() is a HotSpot wrapper around the numa_available function. I don't think you should add this kind of logic inside that function. Could move it up to libnuma_init instead? If you intend this to be a generic (non-ZGC) change, then I think it would be good to create a new RFR and send it to hotspot-dev, so that the Runtime team and others also see it. Thanks, StefanK > > Thanks a lot. > Best regards, > Jie > > ?On 2020/3/23, 4:43 PM, "Stefan Karlsson" wrote: > > Hi Jie, > > On 2020-03-22 14:35, jiefu(??) wrote: > > Hi Erik, > > > > Thanks for your review and valuable comments. > > > > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ > > > > Please review it. > > Thanks for providing this patch. > > If it is only the get_mempolicy that is problematic, then I wonder if it > would be better to leave the UseNUMA flag untouched and only turn off > the ZGC specific NUMA parts. Maybe something like this: > > static bool check_get_mempolicy_support() { > int dummy = 0; > int mode = -1; > // Check whether get_mempolicy is supported or not > if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, > MPOL_F_NODE | MPOL_F_ADDR) == -1) { > if (!FLAG_IS_DEFAULT(UseNUMA)) { > warning("ZGC NUMA support is disabled since get_mempolicy is > unsupported."); > } > return false; > } > > return true; > } > > void ZNUMA::initialize_platform() { > _enabled = UseNUMA && check_get_mempolicy_support(); > } > > An alternative would be to take this a step further (probably as a > separate RFR) and provide a user friendly output in our -Xlog:gc+init > output: > > [0.015s][info][gc,init] Initializing The Z Garbage Collector > [0.015s][info][gc,init] Version: > 15-internal+0-2020-03-04-0947497.stefank... (fastdebug) > [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE > [0.015s][info][gc,init] CPUs: 32 total, 32 available > [0.015s][info][gc,init] Memory: 128851M > [0.015s][info][gc,init] Large Page Support: Disabled > [0.015s][info][gc,init] Medium Page Size: 32M > [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent > > Borrowing the structure from how UseLargePages are setup and printed: > > void ZLargePages::initialize_platform() { > if (UseLargePages) { > if (UseTransparentHugePages) { > _state = Transparent; > } else { > _state = Explicit; > } > } else { > _state = Disabled; > } > } > > const char* ZLargePages::to_string() { > switch (_state) { > case Explicit: > return "Enabled (Explicit)"; > > case Transparent: > return "Enabled (Transparent)"; > > default: > return "Disabled"; > } > } > > Thanks, > StefanK > > > > > Thanks a lot. > > Best regards, > > Jie > > > > On 2020/3/22, 4:26 PM, "Erik ?sterlund" wrote: > > > > Hi Jie, > > > > It seems to me that if the environment doesn?t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line. > > > > Thanks, > > /Erik > > > > > On 20 Mar 2020, at 13:15, jiefu(??) wrote: > > > > > > Hi all, > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 > > > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ > > > > > > A VM fatal error may be observed if ZGC is used. > > > > > > The background is that some of our products will run in the docker. > > > For some safety reason, SYS_get_mempolicy is not allowed in the docker. > > > > > > It might be not a good practice to generate a fatal error when get_mempolicy fails. > > > What do you think? > > > > > > Thanks a lot. > > > Best regards, > > > Jie > > > > > > > > > > > > From per.liden at oracle.com Mon Mar 23 10:05:15 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 23 Mar 2020 11:05:15 +0100 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> <4203A6EE-7589-4A40-9C85-CF9670928752@tencent.com> Message-ID: <48d3f9d5-b92a-2fcf-4e7a-80da4c1f90f2@oracle.com> Hi, On 3/23/20 10:41 AM, Stefan Karlsson wrote: > On 2020-03-23 10:06, jiefu(??) wrote: >> Hi StefanK, >> >> Thanks for your review and very nice suggestions. >> >> After more investigation, I found that several NUMA apis won't work in >> the docker, such as get_mempolicy, numa_tonode_memory, ... >> So it isn't only the get_mempolicy that is problematic. >> >> And Thomas had reminded me that the other gcs are affected by this >> issue too. >> So it would be better to fix them together. >> >> What do you think of >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ? > > numa_available() is a HotSpot wrapper around the numa_available > function. I don't think you should add this kind of logic inside that > function. Could move it up to libnuma_init instead? I agree, numa_available() doesn't look like the right place for this, libnuma_init sounds better. Also, we should also note that, in theory, some of the NUMA-related syscalls (mbind, get_mempolicy, move_pages, etc) could be available but not others. I'm not sure such configurations ever actually appear in the wild though, and if we should care. I suspect checking for one of them is good enough for now, and we can refine this later if it turns out to be a problem. cheers, Per > > If you intend this to be a generic (non-ZGC) change, then I think it > would be good to create a new RFR and send it to hotspot-dev, so that > the Runtime team and others also see it. > > Thanks, > StefanK > >> >> Thanks a lot. >> Best regards, >> Jie >> >> ?On 2020/3/23, 4:43 PM, "Stefan Karlsson" >> wrote: >> >> ???? Hi Jie, >> ???? On 2020-03-22 14:35, jiefu(??) wrote: >> ???? > Hi Erik, >> ???? > >> ???? > Thanks for your review and valuable comments. >> ???? > >> ???? > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ >> ???? > >> ???? > Please review it. >> ???? Thanks for providing this patch. >> ???? If it is only the get_mempolicy that is problematic, then I >> wonder if it >> ???? would be better to leave the UseNUMA flag untouched and only turn >> off >> ???? the ZGC specific NUMA parts. Maybe something like this: >> ???? static bool check_get_mempolicy_support() { >> ??????? int dummy = 0; >> ??????? int mode = -1; >> ??????? // Check whether get_mempolicy is supported or not >> ??????? if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, >> ???? MPOL_F_NODE | MPOL_F_ADDR) == -1) { >> ????????? if (!FLAG_IS_DEFAULT(UseNUMA)) { >> ??????????? warning("ZGC NUMA support is disabled since get_mempolicy is >> ???? unsupported."); >> ????????? } >> ????????? return false; >> ??????? } >> ??????? return true; >> ???? } >> ???? void ZNUMA::initialize_platform() { >> ??????? _enabled = UseNUMA && check_get_mempolicy_support(); >> ???? } >> ???? An alternative would be to take this a step further (probably as a >> ???? separate RFR) and provide a user friendly output in our >> -Xlog:gc+init >> ???? output: >> ???? [0.015s][info][gc,init] Initializing The Z Garbage Collector >> ???? [0.015s][info][gc,init] Version: >> ???? 15-internal+0-2020-03-04-0947497.stefank... (fastdebug) >> ???? [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE >> ???? [0.015s][info][gc,init] CPUs: 32 total, 32 available >> ???? [0.015s][info][gc,init] Memory: 128851M >> ???? [0.015s][info][gc,init] Large Page Support: Disabled >> ???? [0.015s][info][gc,init] Medium Page Size: 32M >> ???? [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent >> ???? Borrowing the structure from how UseLargePages are setup and >> printed: >> ???? void ZLargePages::initialize_platform() { >> ??????? if (UseLargePages) { >> ????????? if (UseTransparentHugePages) { >> ??????????? _state = Transparent; >> ????????? } else { >> ??????????? _state = Explicit; >> ????????? } >> ??????? } else { >> ????????? _state = Disabled; >> ??????? } >> ???? } >> ???? const char* ZLargePages::to_string() { >> ??????? switch (_state) { >> ??????? case Explicit: >> ????????? return "Enabled (Explicit)"; >> ??????? case Transparent: >> ????????? return "Enabled (Transparent)"; >> ??????? default: >> ????????? return "Disabled"; >> ??????? } >> ???? } >> ???? Thanks, >> ???? StefanK >> ???? > >> ???? > Thanks a lot. >> ???? > Best regards, >> ???? > Jie >> ???? > >> ???? > On 2020/3/22, 4:26 PM, "Erik ?sterlund" >> wrote: >> ???? > >> ???? >????? Hi Jie, >> ???? > >> ???? >????? It seems to me that if the environment doesn?t supply the >> required NUMA APIs, then we really should disable UseNUMA instead. I >> propose we check the availability of the syscall during initialization >> instead, and switch off all NUMA functionality when appropriate. And >> we should only print a warning if the user explicitly supplied UseNUMA >> on the command line. >> ???? > >> ???? >????? Thanks, >> ???? >????? /Erik >> ???? > >> ???? >????? > On 20 Mar 2020, at 13:15, jiefu(??) >> wrote: >> ???? >????? > >> ???? >????? > Hi all, >> ???? >????? > >> ???? >????? > JBS:??? https://bugs.openjdk.java.net/browse/JDK-8241354 >> ???? >????? > Webrev: >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ >> ???? >????? > >> ???? >????? > A VM fatal error may be observed if ZGC is used. >> ???? >????? > >> ???? >????? > The background is that some of our products will run in >> the docker. >> ???? >????? > For some safety reason, SYS_get_mempolicy is not allowed >> in the docker. >> ???? >????? > >> ???? >????? > It might be not a good practice to generate a fatal >> error when get_mempolicy fails. >> ???? >????? > What do you think? >> ???? >????? > >> ???? >????? > Thanks a lot. >> ???? >????? > Best regards, >> ???? >????? > Jie >> ???? > >> ???? > >> ???? > >> ???? > >> > From shade at redhat.com Mon Mar 23 11:58:41 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 23 Mar 2020 12:58:41 +0100 Subject: RFR (XS) 8241435: Shenandoah: avoid disabling pacing with "aggressive" Message-ID: <208fa112-7c24-5138-fb7f-1b9ee8b6cf78@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241435 Deeper testing of JDK-8241139 shows that GCLockerWithShenandoah.java reliably fails in release, and always with aggressive heuristics. Investigation shows that application manages to outpace the GC every time, and then fail with OOME too early. We should consider keep pacing enabled even in aggressive mode. This is the day 1 issue with pacing implementation. Fix: --- a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAggressiveHeuristics.cpp Mon Mar 23 12:25:29 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAggressiveHeuristics.cpp Mon Mar 23 12:53:32 2020 +0100 @@ -33,13 +33,10 @@ ShenandoahAggressiveHeuristics::ShenandoahAggressiveHeuristics() : ShenandoahHeuristics() { // Do not shortcut evacuation SHENANDOAH_ERGO_OVERRIDE_DEFAULT(ShenandoahImmediateThreshold, 100); - // Aggressive runs with max speed for allocation, to capture races against mutator - SHENANDOAH_ERGO_DISABLE_FLAG(ShenandoahPacing); - // Aggressive evacuates everything, so it needs as much evac space as it can get SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahEvacReserveOverflow); // If class unloading is globally enabled, aggressive does unloading even with // concurrent cycles. Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From shade at redhat.com Mon Mar 23 12:02:00 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 23 Mar 2020 13:02:00 +0100 Subject: RFR (S) 8241351: Shenandoah: fragmentation metrics overhaul Message-ID: <13f02709-e971-a2fc-b6b8-feb4e3bf3583@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241351 Current fragmentation computations have flaws: - fragmentation metrics are there to drive mutator behavior, but it captures evac-reserve; - external fragmentation do not count humongous objects spanning a single region - external fragmentation takes "total space" over all regions, while using "free space" only for completely empty ones These flaws make gc/oom/TestThreadFailure.java fail with JDK-8241139 changes. Fix: https://cr.openjdk.java.net/~shade/8241351/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug,release} -- Thanks, -Aleksey From shade at redhat.com Mon Mar 23 12:04:09 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 23 Mar 2020 13:04:09 +0100 Subject: RFR (S) 8241139: Shenandoah: distribute mark-compact work exactly to minimize fragmentation In-Reply-To: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> References: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> Message-ID: On 3/19/20 1:21 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241139 > > Was following up on why JLinkTest fails with Shenandoah. Figured out the dynamic work distribution > in mark-compact leaves alive regions in the middle of the heap. It is a generic problem with current > mark-compact implementation, as which regions get into each worker slice is time-dependent. > > Consider the worst case scenario: two workers would have their slices interleaved, once slice is > fully alive, and other is fully dead. In the end, mark-compact would finish with the same > interleaved heap. A humongous allocation then fails. We need to plan the parallel sliding more > accurately. See the code comments about what new plan does. > > Webrev: > https://cr.openjdk.java.net/~shade/8241139/webrev.01/ > > Testing: hotspot_gc_shenandoah; known-failing test; tier{1,2,3} (passed with previous version, > running with new version now); eyeballing shenandoah-visualizer Found the issue about distributing the tail: we cannot blindly do round-robin selection after every worker is full, because that unbalances the work again! So ditched that part for: 607 if (old_wid == wid) { 608 // Circled back to the same worker? This means liveness data was 609 // miscalculated. Bump the live_per_worker limit so that 610 // everyone gets the piece of the leftover work. 611 live_per_worker += ShenandoahHeapRegion::region_size_words(); 612 } Full webrev: https://cr.openjdk.java.net/~shade/8241139/webrev.02/ Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2,3} in progress -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 23 12:25:26 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 23 Mar 2020 13:25:26 +0100 Subject: RFR (XS) 8241435: Shenandoah: avoid disabling pacing with "aggressive" In-Reply-To: <208fa112-7c24-5138-fb7f-1b9ee8b6cf78@redhat.com> References: <208fa112-7c24-5138-fb7f-1b9ee8b6cf78@redhat.com> Message-ID: Hi Aleksey, This looks ok to me! Thank you! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241435 > > Deeper testing of JDK-8241139 shows that GCLockerWithShenandoah.java reliably fails in release, and > always with aggressive heuristics. Investigation shows that application manages to outpace the GC > every time, and then fail with OOME too early. We should consider keep pacing enabled even in > aggressive mode. This is the day 1 issue with pacing implementation. > > Fix: > > --- a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAggressiveHeuristics.cpp Mon Mar 23 > 12:25:29 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAggressiveHeuristics.cpp Mon Mar 23 > 12:53:32 2020 +0100 > @@ -33,13 +33,10 @@ > > ShenandoahAggressiveHeuristics::ShenandoahAggressiveHeuristics() : ShenandoahHeuristics() { > // Do not shortcut evacuation > SHENANDOAH_ERGO_OVERRIDE_DEFAULT(ShenandoahImmediateThreshold, 100); > > - // Aggressive runs with max speed for allocation, to capture races against mutator > - SHENANDOAH_ERGO_DISABLE_FLAG(ShenandoahPacing); > - > // Aggressive evacuates everything, so it needs as much evac space as it can get > SHENANDOAH_ERGO_ENABLE_FLAG(ShenandoahEvacReserveOverflow); > > // If class unloading is globally enabled, aggressive does unloading even with > // concurrent cycles. > > > Testing: hotspot_gc_shenandoah {fastdebug,release} > From rkennke at redhat.com Mon Mar 23 12:35:25 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 23 Mar 2020 13:35:25 +0100 Subject: RFR (S) 8241351: Shenandoah: fragmentation metrics overhaul In-Reply-To: <13f02709-e971-a2fc-b6b8-feb4e3bf3583@redhat.com> References: <13f02709-e971-a2fc-b6b8-feb4e3bf3583@redhat.com> Message-ID: Hi Aleksey, > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241351 > > Current fragmentation computations have flaws: > - fragmentation metrics are there to drive mutator behavior, but it captures evac-reserve; > - external fragmentation do not count humongous objects spanning a single region > - external fragmentation takes "total space" over all regions, while using "free space" only for > completely empty ones > > These flaws make gc/oom/TestThreadFailure.java fail with JDK-8241139 changes. > > Fix: > https://cr.openjdk.java.net/~shade/8241351/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug,release} Good. Just one small nit: + * f) Heap has the small object per each region => IF =~ 1 I'd write 'has *one* small object per region' Don't need another webrev for this. Thank you! Roman From rkennke at redhat.com Mon Mar 23 12:36:57 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 23 Mar 2020 13:36:57 +0100 Subject: RFR (S) 8241139: Shenandoah: distribute mark-compact work exactly to minimize fragmentation In-Reply-To: References: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> Message-ID: <1768e6b3-7840-99b6-8a70-40c9e784cd07@redhat.com> >> RFE: >> https://bugs.openjdk.java.net/browse/JDK-8241139 >> >> Was following up on why JLinkTest fails with Shenandoah. Figured out the dynamic work distribution >> in mark-compact leaves alive regions in the middle of the heap. It is a generic problem with current >> mark-compact implementation, as which regions get into each worker slice is time-dependent. >> >> Consider the worst case scenario: two workers would have their slices interleaved, once slice is >> fully alive, and other is fully dead. In the end, mark-compact would finish with the same >> interleaved heap. A humongous allocation then fails. We need to plan the parallel sliding more >> accurately. See the code comments about what new plan does. >> >> Webrev: >> https://cr.openjdk.java.net/~shade/8241139/webrev.01/ >> >> Testing: hotspot_gc_shenandoah; known-failing test; tier{1,2,3} (passed with previous version, >> running with new version now); eyeballing shenandoah-visualizer > > Found the issue about distributing the tail: we cannot blindly do round-robin selection after every > worker is full, because that unbalances the work again! So ditched that part for: > > 607 if (old_wid == wid) { > 608 // Circled back to the same worker? This means liveness data was > 609 // miscalculated. Bump the live_per_worker limit so that > 610 // everyone gets the piece of the leftover work. > 611 live_per_worker += ShenandoahHeapRegion::region_size_words(); > 612 } > > Full webrev: > https://cr.openjdk.java.net/~shade/8241139/webrev.02/ > > Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2,3} in progress Yep! Probably better to say 'everyone gets *a* piece of the leftover work' ? No new webrev needed if you change this. Roman From jiefu at tencent.com Mon Mar 23 13:16:55 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Mon, 23 Mar 2020 13:16:55 +0000 Subject: 8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail) In-Reply-To: <48d3f9d5-b92a-2fcf-4e7a-80da4c1f90f2@oracle.com> References: <76DE1923-410F-407E-B62D-697074B3CC49@tencent.com> <4203A6EE-7589-4A40-9C85-CF9670928752@tencent.com> <48d3f9d5-b92a-2fcf-4e7a-80da4c1f90f2@oracle.com> Message-ID: Thanks StefanK and Per for your review and nice suggestions. I had filed a new JBS: https://bugs.openjdk.java.net/browse/JDK-8241423 And will send a new RFR in the hotspot-dev list later. Thanks a lot. Best regards, Jie ?On 2020/3/23, 6:06 PM, "Per Liden" wrote: Hi, On 3/23/20 10:41 AM, Stefan Karlsson wrote: > On 2020-03-23 10:06, jiefu(??) wrote: >> Hi StefanK, >> >> Thanks for your review and very nice suggestions. >> >> After more investigation, I found that several NUMA apis won't work in >> the docker, such as get_mempolicy, numa_tonode_memory, ... >> So it isn't only the get_mempolicy that is problematic. >> >> And Thomas had reminded me that the other gcs are affected by this >> issue too. >> So it would be better to fix them together. >> >> What do you think of >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ? > > numa_available() is a HotSpot wrapper around the numa_available > function. I don't think you should add this kind of logic inside that > function. Could move it up to libnuma_init instead? I agree, numa_available() doesn't look like the right place for this, libnuma_init sounds better. Also, we should also note that, in theory, some of the NUMA-related syscalls (mbind, get_mempolicy, move_pages, etc) could be available but not others. I'm not sure such configurations ever actually appear in the wild though, and if we should care. I suspect checking for one of them is good enough for now, and we can refine this later if it turns out to be a problem. cheers, Per > > If you intend this to be a generic (non-ZGC) change, then I think it > would be good to create a new RFR and send it to hotspot-dev, so that > the Runtime team and others also see it. > > Thanks, > StefanK > >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2020/3/23, 4:43 PM, "Stefan Karlsson" >> wrote: >> >> Hi Jie, >> On 2020-03-22 14:35, jiefu(??) wrote: >> > Hi Erik, >> > >> > Thanks for your review and valuable comments. >> > >> > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/ >> > >> > Please review it. >> Thanks for providing this patch. >> If it is only the get_mempolicy that is problematic, then I >> wonder if it >> would be better to leave the UseNUMA flag untouched and only turn >> off >> the ZGC specific NUMA parts. Maybe something like this: >> static bool check_get_mempolicy_support() { >> int dummy = 0; >> int mode = -1; >> // Check whether get_mempolicy is supported or not >> if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, >> MPOL_F_NODE | MPOL_F_ADDR) == -1) { >> if (!FLAG_IS_DEFAULT(UseNUMA)) { >> warning("ZGC NUMA support is disabled since get_mempolicy is >> unsupported."); >> } >> return false; >> } >> return true; >> } >> void ZNUMA::initialize_platform() { >> _enabled = UseNUMA && check_get_mempolicy_support(); >> } >> An alternative would be to take this a step further (probably as a >> separate RFR) and provide a user friendly output in our >> -Xlog:gc+init >> output: >> [0.015s][info][gc,init] Initializing The Z Garbage Collector >> [0.015s][info][gc,init] Version: >> 15-internal+0-2020-03-04-0947497.stefank... (fastdebug) >> [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE >> [0.015s][info][gc,init] CPUs: 32 total, 32 available >> [0.015s][info][gc,init] Memory: 128851M >> [0.015s][info][gc,init] Large Page Support: Disabled >> [0.015s][info][gc,init] Medium Page Size: 32M >> [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent >> Borrowing the structure from how UseLargePages are setup and >> printed: >> void ZLargePages::initialize_platform() { >> if (UseLargePages) { >> if (UseTransparentHugePages) { >> _state = Transparent; >> } else { >> _state = Explicit; >> } >> } else { >> _state = Disabled; >> } >> } >> const char* ZLargePages::to_string() { >> switch (_state) { >> case Explicit: >> return "Enabled (Explicit)"; >> case Transparent: >> return "Enabled (Transparent)"; >> default: >> return "Disabled"; >> } >> } >> Thanks, >> StefanK >> > >> > Thanks a lot. >> > Best regards, >> > Jie >> > >> > On 2020/3/22, 4:26 PM, "Erik ?sterlund" >> wrote: >> > >> > Hi Jie, >> > >> > It seems to me that if the environment doesn?t supply the >> required NUMA APIs, then we really should disable UseNUMA instead. I >> propose we check the availability of the syscall during initialization >> instead, and switch off all NUMA functionality when appropriate. And >> we should only print a warning if the user explicitly supplied UseNUMA >> on the command line. >> > >> > Thanks, >> > /Erik >> > >> > > On 20 Mar 2020, at 13:15, jiefu(??) >> wrote: >> > > >> > > Hi all, >> > > >> > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354 >> > > Webrev: >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/ >> > > >> > > A VM fatal error may be observed if ZGC is used. >> > > >> > > The background is that some of our products will run in >> the docker. >> > > For some safety reason, SYS_get_mempolicy is not allowed >> in the docker. >> > > >> > > It might be not a good practice to generate a fatal >> error when get_mempolicy fails. >> > > What do you think? >> > > >> > > Thanks a lot. >> > > Best regards, >> > > Jie >> > >> > >> > >> > >> > From shade at redhat.com Mon Mar 23 18:14:25 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 23 Mar 2020 19:14:25 +0100 Subject: RFR (S) 8241139: Shenandoah: distribute mark-compact work exactly to minimize fragmentation In-Reply-To: <1768e6b3-7840-99b6-8a70-40c9e784cd07@redhat.com> References: <80699511-8972-ed8e-adfe-f5a9c288c8b6@redhat.com> <1768e6b3-7840-99b6-8a70-40c9e784cd07@redhat.com> Message-ID: On 3/23/20 1:36 PM, Roman Kennke wrote: >> Full webrev: >> https://cr.openjdk.java.net/~shade/8241139/webrev.02/ >> > Probably better to say 'everyone gets *a* piece of the leftover work' ? > > No new webrev needed if you change this. Right. Fixed and pushed. -- Thanks, -Aleksey From shade at redhat.com Mon Mar 23 18:14:47 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 23 Mar 2020 19:14:47 +0100 Subject: RFR (S) 8241351: Shenandoah: fragmentation metrics overhaul In-Reply-To: References: <13f02709-e971-a2fc-b6b8-feb4e3bf3583@redhat.com> Message-ID: <3d1fb01d-97c3-5a26-d2ce-f48b749d42bd@redhat.com> On 3/23/20 1:35 PM, Roman Kennke wrote: >> Fix: >> https://cr.openjdk.java.net/~shade/8241351/webrev.01/ >> >> Testing: hotspot_gc_shenandoah {fastdebug,release} > > Good. Just one small nit: > > + * f) Heap has the small object per each region => IF =~ 1 > > I'd write 'has *one* small object per region' Yup, fixed and pushed. -- Thanks, -Aleksey From ioi.lam at oracle.com Tue Mar 24 01:26:10 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 23 Mar 2020 18:26:10 -0700 Subject: high StringTable scanning overhead during young GCs In-Reply-To: References: <95E126B6-6742-4F91-BC99-BF88F90B7AB4@oracle.com> Message-ID: <58f19f27-6a20-8d62-6b64-ea9df95c9f8e@oracle.com> Where do these interned strings come from? Are they from class files, or are they created dynamically due to program execution. If the set of strings are mostly static, they can be stored in the CDS archived. These strings are not scanned during GC. Thanks - Ioi On 3/20/20 8:07 AM, Tony Printezis wrote: > On March 20, 2020 at 10:54:12 AM, Tony Printezis (tprintezis at twitter.com) > wrote: > > > That requires > the additional CLA that might have unintended effect on other uses of > System.gc though. I keep circling back to wanting an alternative to > System.gc that will attempt a concurrent collection if that's available. > The application can call that when it thinks it's appropriate. > > I'm not fond of the idea of splitting the StringTable into young and > old parts. That seems like a relatively big hammer for what seems > like a pretty specialized problem. > > > So, playing Devil?s advocate, why was it done for nmethods? > > > Actually, one more thing: The fact that we are causing concurrent cycles > every 6 hours is definitely diminishing the problem. However, note that > Young GC times between cycles go up by between 3x and 6x depending on load. > They drop after each cycle but they keep going up until the next cycle. If > we could split the StringTable and avoid scanning the old part, young GCs > would now stay flat. The problem would be that, if we did cycles every 3 to > 4 days, scanning the StringTable during the remark pause would be super > slow, as it will have grown very large (could be seconds just to scan the > StringTable). So, a combination of a split StringTable + frequent cycles > will probably yield the best of both worlds. Anyway, I thought I?d point > that out. > > > Tony > > > > ????? > Tony Printezis | @TonyPrintezis | tprintezis at twitter.com From leonid.mesnik at oracle.com Tue Mar 24 03:55:02 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Mon, 23 Mar 2020 20:55:02 -0700 Subject: RFR: 8241123: Refactor vmTestbase stress framework to use j.u.c and make creation of threads more flexible Message-ID: Hi Could you please review following fix which update ThreadsRunner to use AtomicInteger/spinOnWait instead of Wicket to synchronize starting of stress test threads. Failing tests allocated all memory by earlier started threads before Lock.unlock is called in the latest threads. So thread might get an OOME exception while trying to release lock and/or get into inconsistent state. The bug was introduced by https://bugs.openjdk.java.net/browse/JDK-8241123 The Atomic works fine for stress test finishing sync. I just didn't expect that tests might OOME while releasing start lock. Verified that tests now don't fail with -Xcomp -server -XX:-TieredCompilation -XX:-UseCompressedOops. webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8241456 Leonid From thomas.schatzl at oracle.com Tue Mar 24 12:56:21 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 24 Mar 2020 13:56:21 +0100 Subject: RFR (M): 8238855: Move G1ConcurrentMark flag sanity checks to g1Arguments In-Reply-To: <55CB6F16-9BC8-4AA4-B8E2-AD8A5D00F023@oracle.com> References: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> <55CB6F16-9BC8-4AA4-B8E2-AD8A5D00F023@oracle.com> Message-ID: <29df3941-8027-472c-689c-96614f9eeb44@oracle.com> Hi, On 20.03.20 22:46, Kim Barrett wrote: >> On Mar 20, 2020, at 4:32 AM, Thomas Schatzl wrote: >> >> Hi all, >> >> can I have reviews for this change that moves (and deletes duplicate) flag checking from the G1ConcurrentMark class to the other G1 arguments processing? >> >> Adds a test that checks whether the invariants before/after are still kept. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238855 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238855/webrev/ >> Testing: >> hs-tier1-5 with new test >> >> Thanks, >> Thomas > > Looks good. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Arguments.cpp > 107 void G1Arguments::initialize_mark_stack_size() { > > Any particular reason for splitting this out into a separate function > that has one caller? All the preceeding cases in the same caller are > just directly inlined, so this looks kind of out of place. > > I don't object to this, but might actually prefer there to be more > like it. That can wait for a followup. > > ------------------------------------------------------------------------------ Thanks for your review. The split makes the code of G1Arguments::initialize() much easier to follow. I will file a follow-up for further cleanup. Thanks, Thomas From stefan.johansson at oracle.com Tue Mar 24 14:55:52 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 24 Mar 2020 15:55:52 +0100 Subject: RFR (M): 8238855: Move G1ConcurrentMark flag sanity checks to g1Arguments In-Reply-To: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> References: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> Message-ID: <8c3224e8-4673-44bb-fbf3-81ce4f586177@oracle.com> Hi Thomas, On 2020-03-20 09:32, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this change that moves (and deletes duplicate) > flag checking from the G1ConcurrentMark class to the other G1 arguments > processing? > > Adds a test that checks whether the invariants before/after are still kept. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8238855 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8238855/webrev/ Looks good, Stefan > Testing: > hs-tier1-5 with new test > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Tue Mar 24 16:44:34 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 24 Mar 2020 17:44:34 +0100 Subject: RFR (M): 8238855: Move G1ConcurrentMark flag sanity checks to g1Arguments In-Reply-To: <8c3224e8-4673-44bb-fbf3-81ce4f586177@oracle.com> References: <2c21340b-cc6a-1599-0bc6-9886a486d057@oracle.com> <8c3224e8-4673-44bb-fbf3-81ce4f586177@oracle.com> Message-ID: <0cd6ac9f-0739-f303-4d50-58cca1b1daf7@oracle.com> Hi Stefan, On 24.03.20 15:55, Stefan Johansson wrote: > Hi Thomas, > > On 2020-03-20 09:32, Thomas Schatzl wrote: >> Hi all, >> >> ?? can I have reviews for this change that moves (and deletes >> duplicate) flag checking from the G1ConcurrentMark class to the other >> G1 arguments processing? >> >> Adds a test that checks whether the invariants before/after are still >> kept. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8238855 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8238855/webrev/ > Looks good, > Stefan > thanks for your review. Thomas From shade at redhat.com Tue Mar 24 16:44:39 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Mar 2020 17:44:39 +0100 Subject: RFR (XS) 8241520: Shenandoah: simplify region sequence numbers handling Message-ID: <5f8d0bb4-7e42-e768-548f-524e51f42cb9@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241520 Webrev: https://cr.openjdk.java.net/~shade/8241520/webrev.01/ It ditches the seqnums we don't really need, and make the remaining one Traversal-specific. (We can remove it when/if Traversal goes away). Testing: hotspot_gc_shenandoah {fastdebug,release}; torture tests for allocation path improve a bit -- Thanks, -Aleksey From shade at redhat.com Tue Mar 24 17:08:44 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Mar 2020 18:08:44 +0100 Subject: RFR (XS) 8241534: Shenandoah: region status should include update watermark Message-ID: <0b200e10-72e3-6411-59ec-8cb1475ba145@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241534 Fix: diff -r b58660116a42 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Mar 24 17:49:58 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Mar 24 18:04:13 2020 +0100 @@ -992,5 +992,5 @@ st->print_cr("EU=empty-uncommitted, EC=empty-committed, R=regular, H=humongous start, HC=humongous continuation, CS=collection set, T=trash, P=pinned"); st->print_cr("BTE=bottom/top/end, U=used, T=TLAB allocs, G=GCLAB allocs, S=shared allocs, L=live data"); - st->print_cr("R=root, CP=critical pins, TAMS=top-at-mark-start (previous, next)"); + st->print_cr("R=root, CP=critical pins, TAMS=top-at-mark-start, UWM=update watermark"); st->print_cr("SN=alloc sequence number"); diff -r b58660116a42 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Tue Mar 24 17:49:58 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Tue Mar 24 18:04:13 2020 +0100 @@ -414,4 +414,6 @@ st->print("|TAMS " INTPTR_FORMAT_W(12), p2i(_heap->marking_context()->top_at_mark_start(const_cast(this)))); + st->print("|UWM " INTPTR_FORMAT_W(12), + p2i(_update_watermark)); st->print("|U " SIZE_FORMAT_W(5) "%1s", byte_size_in_proper_unit(used()), proper_unit_for_byte_size(used())); st->print("|T " SIZE_FORMAT_W(5) "%1s", byte_size_in_proper_unit(get_tlab_allocs()), proper_unit_for_byte_size(get_tlab_allocs())); Testing: hotspot_gc_shenandoah, eyeballing artificially triggered hs_errs -- Thanks, -Aleksey From rkennke at redhat.com Tue Mar 24 17:16:55 2020 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 24 Mar 2020 18:16:55 +0100 Subject: RFR (XS) 8241534: Shenandoah: region status should include update watermark In-Reply-To: <0b200e10-72e3-6411-59ec-8cb1475ba145@redhat.com> References: <0b200e10-72e3-6411-59ec-8cb1475ba145@redhat.com> Message-ID: Indeed. Looks good! Thank you! Roman Am 24.03.20 um 18:08 schrieb Aleksey Shipilev: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241534 > > Fix: > > diff -r b58660116a42 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Mar 24 17:49:58 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Mar 24 18:04:13 2020 +0100 > @@ -992,5 +992,5 @@ > st->print_cr("EU=empty-uncommitted, EC=empty-committed, R=regular, H=humongous start, > HC=humongous continuation, CS=collection set, T=trash, P=pinned"); > st->print_cr("BTE=bottom/top/end, U=used, T=TLAB allocs, G=GCLAB allocs, S=shared allocs, L=live > data"); > - st->print_cr("R=root, CP=critical pins, TAMS=top-at-mark-start (previous, next)"); > + st->print_cr("R=root, CP=critical pins, TAMS=top-at-mark-start, UWM=update watermark"); > st->print_cr("SN=alloc sequence number"); > > diff -r b58660116a42 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Tue Mar 24 17:49:58 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Tue Mar 24 18:04:13 2020 +0100 > @@ -414,4 +414,6 @@ > st->print("|TAMS " INTPTR_FORMAT_W(12), > p2i(_heap->marking_context()->top_at_mark_start(const_cast(this)))); > + st->print("|UWM " INTPTR_FORMAT_W(12), > + p2i(_update_watermark)); > st->print("|U " SIZE_FORMAT_W(5) "%1s", byte_size_in_proper_unit(used()), > proper_unit_for_byte_size(used())); > st->print("|T " SIZE_FORMAT_W(5) "%1s", byte_size_in_proper_unit(get_tlab_allocs()), > proper_unit_for_byte_size(get_tlab_allocs())); > > Testing: hotspot_gc_shenandoah, eyeballing artificially triggered hs_errs > From shade at redhat.com Tue Mar 24 19:03:16 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Mar 2020 20:03:16 +0100 Subject: RFR (XS) 8241545: Shenandoah: purge root work overwrites counters after JDK-8228818 Message-ID: <84d43c7b-3cc8-d440-7770-90803586c5d3@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8241545 Fix: diff -r 97a3e6ce2652 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Tue Mar 24 18:46:48 2020 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Tue Mar 24 20:02:16 2020 +0100 @@ -78,4 +78,5 @@ f(purge_class_unload, " Unload Classes") \ f(purge_par, " Parallel Cleanup") \ + SHENANDOAH_GC_PAR_PHASE_DO(purge_par_roots, " PC: ", f) \ f(purge_cldg, " CLDG") \ f(complete_liveness, " Complete Liveness") \ @@ -136,4 +137,5 @@ f(full_gc_purge_class_unload, " Unload Classes") \ f(full_gc_purge_par, " Parallel Cleanup") \ + SHENANDOAH_GC_PAR_PHASE_DO(full_gc_purge_roots, " PC: ", f) \ f(full_gc_purge_cldg, " CLDG") \ f(full_gc_calculate_addresses, " Calculate Addresses") \ Testing: eyeballing gc+stats logs -- Thanks, -Aleksey From zgu at redhat.com Tue Mar 24 19:07:27 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 24 Mar 2020 15:07:27 -0400 Subject: RFR (XS) 8241545: Shenandoah: purge root work overwrites counters after JDK-8228818 In-Reply-To: <84d43c7b-3cc8-d440-7770-90803586c5d3@redhat.com> References: <84d43c7b-3cc8-d440-7770-90803586c5d3@redhat.com> Message-ID: Looks good to me. Thanks for fixing it. -Zhengyu On 3/24/20 3:03 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8241545 > > Fix: > > diff -r 97a3e6ce2652 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Tue Mar 24 18:46:48 2020 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Tue Mar 24 20:02:16 2020 +0100 > @@ -78,4 +78,5 @@ > f(purge_class_unload, " Unload Classes") \ > f(purge_par, " Parallel Cleanup") \ > + SHENANDOAH_GC_PAR_PHASE_DO(purge_par_roots, " PC: ", f) \ > f(purge_cldg, " CLDG") \ > f(complete_liveness, " Complete Liveness") \ > @@ -136,4 +137,5 @@ > f(full_gc_purge_class_unload, " Unload Classes") \ > f(full_gc_purge_par, " Parallel Cleanup") \ > + SHENANDOAH_GC_PAR_PHASE_DO(full_gc_purge_roots, " PC: ", f) \ > f(full_gc_purge_cldg, " CLDG") \ > f(full_gc_calculate_addresses, " Calculate Addresses") \ > > Testing: eyeballing gc+stats logs > From shade at redhat.com Tue Mar 24 19:28:26 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 24 Mar 2020 20:28:26 +0100 Subject: RFR (XS) 8241545: Shenandoah: purge root work overwrites counters after JDK-8228818 In-Reply-To: References: <84d43c7b-3cc8-d440-7770-90803586c5d3@redhat.com> Message-ID: <3827dcaa-8e4e-a20f-1434-d59ed1c0a143@redhat.com> On 3/24/20 8:07 PM, Zhengyu Gu wrote: > Looks good to me. > > Thanks for fixing it. No problem, pushed. -- Thanks, -Aleksey From shade at redhat.com Wed Mar 25 11:44:12 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 25 Mar 2020 12:44:12 +0100 Subject: RFR (S) 8241583: Shenandoah: turn heap lock asserts into macros Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8241583 See rationale in the bug. Webrev: https://cr.openjdk.java.net/~shade/8241583/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug, release} -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 25 11:54:24 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 25 Mar 2020 12:54:24 +0100 Subject: RFR (S) 8241583: Shenandoah: turn heap lock asserts into macros In-Reply-To: References: Message-ID: <6ea42db1-6b0d-a19f-de94-31a07be9e131@redhat.com> Ok. Thank you, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241583 > > See rationale in the bug. > > Webrev: > https://cr.openjdk.java.net/~shade/8241583/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug, release} > From per.liden at oracle.com Wed Mar 25 13:46:43 2020 From: per.liden at oracle.com (Per Liden) Date: Wed, 25 Mar 2020 14:46:43 +0100 Subject: RFR: 8241596: ZGC: Shorten runtime of gc/z/TestUncommit.java Message-ID: <09514986-4aa5-fdbd-6b19-a8beb3efb382@oracle.com> The test gc/z/TestUncommit.java is dangerously close to the 120 second timeout, and spurious timeouts have been reported [1]. We should shorten the runtime of this test to avoid that. [1] http://mail.openjdk.java.net/pipermail/zgc-dev/2020-March/000892.html Bug: https://bugs.openjdk.java.net/browse/JDK-8241596 Webrev: http://cr.openjdk.java.net/~pliden/8241596/webrev.0 /Per From thomas.schatzl at oracle.com Wed Mar 25 13:47:54 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 25 Mar 2020 14:47:54 +0100 Subject: RFR: 8241596: ZGC: Shorten runtime of gc/z/TestUncommit.java In-Reply-To: <09514986-4aa5-fdbd-6b19-a8beb3efb382@oracle.com> References: <09514986-4aa5-fdbd-6b19-a8beb3efb382@oracle.com> Message-ID: <7f1e2be1-68ac-b08e-6446-38f96421df7c@oracle.com> Hi, On 25.03.20 14:46, Per Liden wrote: > The test gc/z/TestUncommit.java is dangerously close to the 120 second > timeout, and spurious timeouts have been reported [1]. We should shorten > the runtime of this test to avoid that. > > [1] http://mail.openjdk.java.net/pipermail/zgc-dev/2020-March/000892.html > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241596 > Webrev: http://cr.openjdk.java.net/~pliden/8241596/webrev.0 > > /Per lgtm Thomas From stefan.johansson at oracle.com Wed Mar 25 14:02:04 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 25 Mar 2020 15:02:04 +0100 Subject: RFR: 8241596: ZGC: Shorten runtime of gc/z/TestUncommit.java In-Reply-To: <7f1e2be1-68ac-b08e-6446-38f96421df7c@oracle.com> References: <09514986-4aa5-fdbd-6b19-a8beb3efb382@oracle.com> <7f1e2be1-68ac-b08e-6446-38f96421df7c@oracle.com> Message-ID: <0a7f3c62-b6c6-00e0-9301-f48154958111@oracle.com> On 2020-03-25 14:47, Thomas Schatzl wrote: > Hi, > > On 25.03.20 14:46, Per Liden wrote: >> The test gc/z/TestUncommit.java is dangerously close to the 120 second >> timeout, and spurious timeouts have been reported [1]. We should >> shorten the runtime of this test to avoid that. >> >> [1] http://mail.openjdk.java.net/pipermail/zgc-dev/2020-March/000892.html >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8241596 >> Webrev: http://cr.openjdk.java.net/~pliden/8241596/webrev.0 >> >> /Per > > ? lgtm +1 > > Thomas From rkennke at redhat.com Wed Mar 25 17:29:21 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 25 Mar 2020 18:29:21 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery Message-ID: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> Shenandoah uses SATB for concurrent marking, and thus needs to track stores and explicitely make previous values grey. An exception to this are (soft/weak/final) references: the reachability of referents can be changed (from weak to strong) when a referent is accessed. For this reason we require a keep-alive barrier which also makes referents grey whenever they are accessed via Reference.get(). The downside of this is if a workload churns weak-references (i.e. accesses them often) they might never get a chance to be reclaimed. The key insight for improving the situation is that the change of reachability of the referent is only relevant when that referent is stored somewhere else. We can elide the keep-alive barrier, if the compiler can prove that this doesn't happen. (We also need to check if the referent leaves the scope of the method via method-call or return, because it may be stored outside of the method.) The caveat here is that we must do another scan of the threads' stacks at final-mark to catch referents that are in a local variable at final-mark - we must not loose those. However, according to my measurements, this is only a minor (if any) regression in final-mark-latency. Issue: https://bugs.openjdk.java.net/browse/JDK-8241605 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ Testing: hotspot_gc_shenandoah (including new testcase that verifies the improved behaviour), manual tests, specjbb and specjvm runs I'd like to push this to shenandoah/jdk first, to give it a few more rounds of testing before hitting mainline JDK. Ok? Roman From igor.ignatyev at oracle.com Wed Mar 25 17:42:45 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 25 Mar 2020 10:42:45 -0700 Subject: RFR(S) : 8203238: [TESTBUG] rewrite MemOptions shell test in Java Message-ID: <6B89C20B-36D8-4743-979B-56DDF8ADCE64@oracle.com> http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 > 330 lines changed: 91 ins; 236 del; 3 mod; Hi all, could you please review this small patch which rewrites MemOptions shell test? while porting the test, I noticed that available memory checks aren't required, and the test successfully passes even w/o them, so the java version of the test doesn't check available memory and only @requires 64 bits vm. given the test doesn't require lots of time/resources to execute, I've also removed it from exclusiveAccess. MemStat class was made static inner class of MemOptionsTest for the sake of readability and brevity. webrev: http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 testing: the changed tests multiple tests on {linux, windows, mac} w/ {SerialGC,ZGC,G1GC,ParallelGC} JBS: https://bugs.openjdk.java.net/browse/JDK-8203238 NB the shell version of the test had a bug which prevent its execution. an incorrect operator (:=) was used at L#23,23, which led to bogus 'java' variable at L#44 and non zero exit code at L#48, so the test passes w/ 'Skipping the test; a 64-bit VM is required.' message on all platforms. so this patch effectively resurrects the test. Thanks, -- Igor From shade at redhat.com Wed Mar 25 18:13:47 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 25 Mar 2020 19:13:47 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> Message-ID: On 3/25/20 6:29 PM, Roman Kennke wrote: > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ *) Feels like this cast: 1578 if (((ShenandoahEnqueueBarrierNode*)barrier)->can_eliminate(phase)) { ...should be done when we poll the node a few lines above? 1573 Node* barrier = state->enqueue_barrier(i); *) "test_heap_stable" is misleading name now. "test_heap_state"? You already "assert is_heap_state_test". *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic optimization? It seems good by itself to test for this instead of falling through for evac/update-refs? (Actually, what checks for *that* in the old code?) It is good for sh/jdk sandbox. Please make sure you don't use bug ID when pushing there, so that history would separate this prototype and the actual upstream change? -- Thanks, -Aleksey From rkennke at redhat.com Wed Mar 25 19:34:10 2020 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 25 Mar 2020 20:34:10 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> Message-ID: Hi Aleksey, >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ > > *) Feels like this cast: > > 1578 if (((ShenandoahEnqueueBarrierNode*)barrier)->can_eliminate(phase)) { > > ...should be done when we poll the node a few lines above? > 1573 Node* barrier = state->enqueue_barrier(i); Indeed. And it doesn't require a cast there! :-) > *) "test_heap_stable" is misleading name now. "test_heap_state"? You already "assert > is_heap_state_test". Right. Changed that. > *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic > optimization? It seems good by itself to test for this instead of falling through for > evac/update-refs? (Actually, what checks for *that* in the old code?) I believe you're confusing the enqueue-barrier with the LRB here. EQ-barriers are only relevant during marking (and traversal). > It is good for sh/jdk sandbox. Please make sure you don't use bug ID when pushing there, so that > history would separate this prototype and the actual upstream change? Yes, will do. Thanks, Roman From kim.barrett at oracle.com Wed Mar 25 22:37:29 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 25 Mar 2020 18:37:29 -0400 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: References: Message-ID: <0B438446-208F-4CE3-B3D7-A8E0596E0133@oracle.com> > On Mar 19, 2020, at 5:44 AM, Stefan Karlsson wrote: > > Hi all, > > Please review this patch to rewrite the GCTimer, and associated classes, to not allow nested phases of different types (pause or concurrent). > > https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241160 Looks good. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/gcTimer.cpp 105 GCPhase::PhaseType TimePartitions::current_phase_type() const { 106 int level = _active_phases.count(); 107 int index = _active_phases.phase_index(level - 1); Maybe assert level > 0. No need for a new webrev if you want to make this change. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/gcTraceSend.cpp 306 assert(phase->level() < 2, "There is only two levels for ConcurrentPhase"); Seems like there ought to be a named constant for that "2", similar to PhasesStack::PHASE_LEVELS used in a similar place in visit_pause. But then, there's a mismatch between PHASE_LEVELS (6) and the number of cases in visit_pause (4). That's "interesting"... ------------------------------------------------------------------------------ From shade at redhat.com Thu Mar 26 05:35:12 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 06:35:12 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> Message-ID: <1204a65d-9874-b1c2-2990-570c660e52ad@redhat.com> On 3/25/20 8:34 PM, Roman Kennke wrote: >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >> >> *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic >> optimization? It seems good by itself to test for this instead of falling through for >> evac/update-refs? (Actually, what checks for *that* in the old code?) > > I believe you're confusing the enqueue-barrier with the LRB here. > EQ-barriers are only relevant during marking (and traversal). No, I am not. Look here in EQ barrier expansion: // Stable path. - test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase); + test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase, ShenandoahHeap::TRAVERSAL | ShenandoahHeap::MARKING); region->init_req(_heap_stable, heap_stable_ctrl); phi->init_req(_heap_stable, raw_mem); In current code, EQ fastpath tests test_heap_stable (HAS_FORWARDED). In your patch, it now tests for test_heap_state(TRAVERSAL|MARKING). Yes, EQ barriers are only relevant during marking/traversal, so that must mean that _current code_ does test_heap_stable(HAS_FORWARDED) either incorrectly, or inefficiently. This looks irrelevant to what aggressive reference discovery does? -- Thanks, -Aleksey From shade at redhat.com Thu Mar 26 06:19:43 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 07:19:43 +0100 Subject: RFR (S) 8232100: GC timings should use proper units for heap sizes In-Reply-To: <4ad7db72-1a52-037f-37b1-558ec176a172@redhat.com> References: <4ad7db72-1a52-037f-37b1-558ec176a172@redhat.com> Message-ID: <89bf17db-fbf3-e771-e940-f9ffb0489275@redhat.com> On 2/20/20 1:24 PM, Aleksey Shipilev wrote: > On 10/10/19 2:03 PM, Aleksey Shipilev wrote: >> RFE: >> https://bugs.openjdk.java.net/browse/JDK-8232100 >> >> Webrev: >> https://cr.openjdk.java.net/~shade/8232100/webrev.01/ >> >> GC log prints heap sizes in selected GC events. Currently, it unconditionally uses "M" as the suffix >> for heap sizes, which makes GC logs too coarse on smaller heaps. This loses performance data >> accuracy, which is sometimes a dealbreaker in logs analysis. Let's make it into proper units. >> >> I ran many tests of my own, but would appreciate if somebody runs it through more comprehensive >> suite of tests, looking for tests that parse the GC logs for whatever reason. >> >> Testing: eyeballing GC logs, jdk-submit, hotspot_gc {g1, shenandoah, parallel} > > No takers? :) Still no takers? -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 26 08:49:35 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 09:49:35 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: <1204a65d-9874-b1c2-2990-570c660e52ad@redhat.com> References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> <1204a65d-9874-b1c2-2990-570c660e52ad@redhat.com> Message-ID: <8860408d-b5f1-b8c9-d769-22715de41f0b@redhat.com> Am 26.03.20 um 06:35 schrieb Aleksey Shipilev: > On 3/25/20 8:34 PM, Roman Kennke wrote: >>>> Webrev: >>>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >>> >>> *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic >>> optimization? It seems good by itself to test for this instead of falling through for >>> evac/update-refs? (Actually, what checks for *that* in the old code?) >> >> I believe you're confusing the enqueue-barrier with the LRB here. >> EQ-barriers are only relevant during marking (and traversal). > > No, I am not. > > Look here in EQ barrier expansion: > > // Stable path. > - test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase); > + test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase, ShenandoahHeap::TRAVERSAL | > ShenandoahHeap::MARKING); > region->init_req(_heap_stable, heap_stable_ctrl); > phi->init_req(_heap_stable, raw_mem); > > > In current code, EQ fastpath tests test_heap_stable (HAS_FORWARDED). In your patch, it now tests for > test_heap_state(TRAVERSAL|MARKING). Yes, EQ barriers are only relevant during marking/traversal, so > that must mean that _current code_ does test_heap_stable(HAS_FORWARDED) either incorrectly, or > inefficiently. > > This looks irrelevant to what aggressive reference discovery does? Ah. Previously, we only used ShenandoahEnqueueBarrierNode with Traversal GC. There is only one phase, so the test for HAS_FORWARDED was actually testing for TRAVERSAL, but we took a short-cut there. Now we also use it for concurrent marking, so I couldn't take this short-cut anymore. The new code is correct. Yes? Thanks, Roman From per.liden at oracle.com Thu Mar 26 09:09:17 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 26 Mar 2020 10:09:17 +0100 Subject: RFR: 8241596: ZGC: Shorten runtime of gc/z/TestUncommit.java In-Reply-To: <0a7f3c62-b6c6-00e0-9301-f48154958111@oracle.com> References: <09514986-4aa5-fdbd-6b19-a8beb3efb382@oracle.com> <7f1e2be1-68ac-b08e-6446-38f96421df7c@oracle.com> <0a7f3c62-b6c6-00e0-9301-f48154958111@oracle.com> Message-ID: <9cb053bf-c493-6c9c-86e0-11a2484acde0@oracle.com> Thanks Thomas and Stefan! /Per On 3/25/20 3:02 PM, Stefan Johansson wrote: > > > On 2020-03-25 14:47, Thomas Schatzl wrote: >> Hi, >> >> On 25.03.20 14:46, Per Liden wrote: >>> The test gc/z/TestUncommit.java is dangerously close to the 120 >>> second timeout, and spurious timeouts have been reported [1]. We >>> should shorten the runtime of this test to avoid that. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/zgc-dev/2020-March/000892.html >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8241596 >>> Webrev: http://cr.openjdk.java.net/~pliden/8241596/webrev.0 >>> >>> /Per >> >> ?? lgtm > +1 >> >> Thomas From shade at redhat.com Thu Mar 26 09:14:08 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 10:14:08 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: <8860408d-b5f1-b8c9-d769-22715de41f0b@redhat.com> References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> <1204a65d-9874-b1c2-2990-570c660e52ad@redhat.com> <8860408d-b5f1-b8c9-d769-22715de41f0b@redhat.com> Message-ID: On 3/26/20 9:49 AM, Roman Kennke wrote: > Am 26.03.20 um 06:35 schrieb Aleksey Shipilev: >> On 3/25/20 8:34 PM, Roman Kennke wrote: >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >>>> >>>> *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic >>>> optimization? It seems good by itself to test for this instead of falling through for >>>> evac/update-refs? (Actually, what checks for *that* in the old code?) >>> >>> I believe you're confusing the enqueue-barrier with the LRB here. >>> EQ-barriers are only relevant during marking (and traversal). >> >> No, I am not. >> >> Look here in EQ barrier expansion: >> >> // Stable path. >> - test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase); >> + test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase, ShenandoahHeap::TRAVERSAL | >> ShenandoahHeap::MARKING); >> region->init_req(_heap_stable, heap_stable_ctrl); >> phi->init_req(_heap_stable, raw_mem); >> >> >> In current code, EQ fastpath tests test_heap_stable (HAS_FORWARDED). In your patch, it now tests for >> test_heap_state(TRAVERSAL|MARKING). Yes, EQ barriers are only relevant during marking/traversal, so >> that must mean that _current code_ does test_heap_stable(HAS_FORWARDED) either incorrectly, or >> inefficiently. >> >> This looks irrelevant to what aggressive reference discovery does? > > Ah. Previously, we only used ShenandoahEnqueueBarrierNode with Traversal > GC. There is only one phase, so the test for HAS_FORWARDED was actually > testing for TRAVERSAL, but we took a short-cut there. Now we also use it > for concurrent marking, so I couldn't take this short-cut anymore. The > new code is correct. > > Yes? Ah, okay, that makes much more sense. -- Thanks, -Aleksey From stefan.karlsson at oracle.com Thu Mar 26 09:34:05 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 26 Mar 2020 10:34:05 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: <0B438446-208F-4CE3-B3D7-A8E0596E0133@oracle.com> References: <0B438446-208F-4CE3-B3D7-A8E0596E0133@oracle.com> Message-ID: <83dbc1c5-cabc-c414-1ad4-4e6a1b8b5c05@oracle.com> On 2020-03-25 23:37, Kim Barrett wrote: >> On Mar 19, 2020, at 5:44 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Please review this patch to rewrite the GCTimer, and associated classes, to not allow nested phases of different types (pause or concurrent). >> >> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241160 > Looks good. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/gcTimer.cpp > 105 GCPhase::PhaseType TimePartitions::current_phase_type() const { > 106 int level = _active_phases.count(); > 107 int index = _active_phases.phase_index(level - 1); > > Maybe assert level > 0. No need for a new webrev if you want to make > this change. I'll add the assert. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/gcTraceSend.cpp > 306 assert(phase->level() < 2, "There is only two levels for ConcurrentPhase"); > > Seems like there ought to be a named constant for that "2", similar to > PhasesStack::PHASE_LEVELS used in a similar place in visit_pause. But > then, there's a mismatch between PHASE_LEVELS (6) and the number of > cases in visit_pause (4). That's "interesting"... Yeah. visit_pause checks that we don't push too many levels of phases, but we already check that when we push phases: void PhasesStack::push(int phase_index) { ? assert(_next_phase_level < PHASE_LEVELS, "Overflow"); ... ? int level = _active_phases.count(); ? GCPhase phase; ? phase.set_type(type); ? phase.set_level(level); ? phase.set_name(name); ? phase.set_start(time); ? int index = _phases->append(phase); ? _active_phases.push(index); I think it would make sense to remove that assert, to get rid of that confusion. I'm not so sure about removing 'assert(phase->level() < 2' since it's there to to catch when we start to use deeper nesting of the concurrent phases. Maybe if we also add EventGCPhaseConcurrentLevel2-4 (analogous to visit_pause) and then get rid of this assert? If we want to go ahead and do any of these changes (find a name for the 2 or adding events) I'll create a separate RFR. Thanks, StefanK > ------------------------------------------------------------------------------ > From rkennke at redhat.com Thu Mar 26 09:35:50 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 10:35:50 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> <1204a65d-9874-b1c2-2990-570c660e52ad@redhat.com> <8860408d-b5f1-b8c9-d769-22715de41f0b@redhat.com> Message-ID: Am 26.03.20 um 10:14 schrieb Aleksey Shipilev: > On 3/26/20 9:49 AM, Roman Kennke wrote: >> Am 26.03.20 um 06:35 schrieb Aleksey Shipilev: >>> On 3/25/20 8:34 PM, Roman Kennke wrote: >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >>>>> >>>>> *) I actually wonder if checking for TRAVERSAL|MARKING should be done as another generic >>>>> optimization? It seems good by itself to test for this instead of falling through for >>>>> evac/update-refs? (Actually, what checks for *that* in the old code?) >>>> >>>> I believe you're confusing the enqueue-barrier with the LRB here. >>>> EQ-barriers are only relevant during marking (and traversal). >>> >>> No, I am not. >>> >>> Look here in EQ barrier expansion: >>> >>> // Stable path. >>> - test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase); >>> + test_heap_stable(ctrl, raw_mem, heap_stable_ctrl, phase, ShenandoahHeap::TRAVERSAL | >>> ShenandoahHeap::MARKING); >>> region->init_req(_heap_stable, heap_stable_ctrl); >>> phi->init_req(_heap_stable, raw_mem); >>> >>> >>> In current code, EQ fastpath tests test_heap_stable (HAS_FORWARDED). In your patch, it now tests for >>> test_heap_state(TRAVERSAL|MARKING). Yes, EQ barriers are only relevant during marking/traversal, so >>> that must mean that _current code_ does test_heap_stable(HAS_FORWARDED) either incorrectly, or >>> inefficiently. >>> >>> This looks irrelevant to what aggressive reference discovery does? >> >> Ah. Previously, we only used ShenandoahEnqueueBarrierNode with Traversal >> GC. There is only one phase, so the test for HAS_FORWARDED was actually >> testing for TRAVERSAL, but we took a short-cut there. Now we also use it >> for concurrent marking, so I couldn't take this short-cut anymore. The >> new code is correct. >> >> Yes? > > Ah, okay, that makes much more sense. Thanks! I pushed the change to shenandoah/jdk. Roman From stefan.karlsson at oracle.com Thu Mar 26 09:49:01 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 26 Mar 2020 10:49:01 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: References: Message-ID: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> Shenandoah devs, any comments w.r.t. to the Shenandoah section below? Thanks, StefanK On 2020-03-19 10:44, Stefan Karlsson wrote: > Hi all, > > Please review this patch to rewrite the GCTimer, and associated > classes, to not allow nested phases of different types (pause or > concurrent). > > https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241160 > > A bug was found when I was looking at JFR events from ZGC. A > GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The > only valid parent is a GCPhasePause event. The reason why this > happened was that the we use a GCTraceTime class inside the class > unloading code. Previously, we only used GCTraceTimes inside pauses, > but ever since class unloading was moved out to a concurrent phase, > this isn't true anymore. GCTraceTime used > GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and > therefore always reported pauses and pause sub-phases. > > With this patch, I suggest that we become stricter in our usages of > the GCTimer. The effects of the patch are: > > 1) When a top-level pause (or concurrent) phase is created, the code > must be explicit about what type of phase is created. The code will > now assert if this is abused. Most places were already explicit, but I > had to change two places: > > a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have > access to register_gc_pause_start. I made that function public, > instead of protected, so that we didn't have to deal with that problem. > > b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in > VM_G1Concurrent). This is the only place that uses GCTraceTime to > start a pause. All other places use GCTraceTime to create sub-phases. > I could have copy-n-pasted the entire > GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and > create a version that calls register_gc_pause_start instead of > register_gc_phase_start. Instead of doing that I opted for creating a > system where the code code register a set of callbacks to be called > when the start and end time is registered. This is used in the backend > of GCTraceTime, but then also used by G1 to allow us to not have to > copy-n-paste a lot of the code. > > I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic > to the default callbacks (unfied logging and GCTimer) but couldn't > find a nice way to express that, because of the way we macro-expand > the UL tags. Maybe something we can consider for a future investigation. > > 2) sub-phases now inherit the type from the parent phase, and there's > no possibility to incorrectly nest phases anymore. This also removed > the need for ConcurrentGCTimer::_is_concurrent_phase_active. > > 3) This allows (and encourages concurrent sub-phases). When the JFR > events were ported to HotSpot, only pauses got sub-phases, because > there wasn't a big need for concurrent sub-phases. In this patch I > added level of sub-phases to JFR. Maybe it would be better to add more > right away? (I'm not a fan of having the explicit sub-phase level > events, instead of a counter in *the* phase event, but the JMC team at > that time needed it to be logged as separate events. Maybe something > that could be reconsidered some time) > > 4) The different consumers of the timestamps are separated into their > own classes. > > 5) Shenandoah devs need to consider what to do about this change: > > - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); > + // FIXME: This turns off the previously broken JFR events. If we > want to keep reporting them, > + // but with the correct type (Concurrent) then a top-level > concurrent phase is required. > + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer > */); > > Where this code caused GCPhasePauseLevel1 events for ZGC, this used to > create GCPhasePause events for Shenandoah. It uses GCTraceTime to log > sub-phases, but the current Shenandoah code hasn't registered a > top-level phase at this point. Either we keep this code with the > removal of the gc_timer argument, or we add a top-level phase > somewhere. If we want the latter, then I need suggestions on where to > add them. Or maybe push the current code, and fix it as a follow-up > patch? > > What do you think? An alternative is to (continue?) completely forbid > concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes > during concurrent phases. Even if we decide to do that, I think > there's some merit to the stricter GCTimer code, and the slight > separation of concern in GCTraceTime. > > Tested tier1-3 > > Thanks, > StefanK From rkennke at redhat.com Thu Mar 26 09:54:34 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 10:54:34 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> References: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> Message-ID: <31bcbdf5-6b20-502d-5f91-8bd18962985d@redhat.com> Hey Stefan, Sorry, this went under my radar. Give us half a day or so, yes? Thanks, Roman > Shenandoah devs, any comments w.r.t. to the Shenandoah section below? > > Thanks, > StefanK > > On 2020-03-19 10:44, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to rewrite the GCTimer, and associated >> classes, to not allow nested phases of different types (pause or >> concurrent). >> >> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241160 >> >> A bug was found when I was looking at JFR events from ZGC. A >> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The >> only valid parent is a GCPhasePause event. The reason why this >> happened was that the we use a GCTraceTime class inside the class >> unloading code. Previously, we only used GCTraceTimes inside pauses, >> but ever since class unloading was moved out to a concurrent phase, >> this isn't true anymore. GCTraceTime used >> GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and >> therefore always reported pauses and pause sub-phases. >> >> With this patch, I suggest that we become stricter in our usages of >> the GCTimer. The effects of the patch are: >> >> 1) When a top-level pause (or concurrent) phase is created, the code >> must be explicit about what type of phase is created. The code will >> now assert if this is abused. Most places were already explicit, but I >> had to change two places: >> >> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have >> access to register_gc_pause_start. I made that function public, >> instead of protected, so that we didn't have to deal with that problem. >> >> b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in >> VM_G1Concurrent). This is the only place that uses GCTraceTime to >> start a pause. All other places use GCTraceTime to create sub-phases. >> I could have copy-n-pasted the entire >> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and >> create a version that calls register_gc_pause_start instead of >> register_gc_phase_start. Instead of doing that I opted for creating a >> system where the code code register a set of callbacks to be called >> when the start and end time is registered. This is used in the backend >> of GCTraceTime, but then also used by G1 to allow us to not have to >> copy-n-paste a lot of the code. >> >> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic >> to the default callbacks (unfied logging and GCTimer) but couldn't >> find a nice way to express that, because of the way we macro-expand >> the UL tags. Maybe something we can consider for a future investigation. >> >> 2) sub-phases now inherit the type from the parent phase, and there's >> no possibility to incorrectly nest phases anymore. This also removed >> the need for ConcurrentGCTimer::_is_concurrent_phase_active. >> >> 3) This allows (and encourages concurrent sub-phases). When the JFR >> events were ported to HotSpot, only pauses got sub-phases, because >> there wasn't a big need for concurrent sub-phases. In this patch I >> added level of sub-phases to JFR. Maybe it would be better to add more >> right away? (I'm not a fan of having the explicit sub-phase level >> events, instead of a counter in *the* phase event, but the JMC team at >> that time needed it to be logged as separate events. Maybe something >> that could be reconsidered some time) >> >> 4) The different consumers of the timestamps are separated into their >> own classes. >> >> 5) Shenandoah devs need to consider what to do about this change: >> >> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); >> + // FIXME: This turns off the previously broken JFR events. If we >> want to keep reporting them, >> + // but with the correct type (Concurrent) then a top-level >> concurrent phase is required. >> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer >> */); >> >> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to >> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log >> sub-phases, but the current Shenandoah code hasn't registered a >> top-level phase at this point. Either we keep this code with the >> removal of the gc_timer argument, or we add a top-level phase >> somewhere. If we want the latter, then I need suggestions on where to >> add them. Or maybe push the current code, and fix it as a follow-up >> patch? >> >> What do you think? An alternative is to (continue?) completely forbid >> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes >> during concurrent phases. Even if we decide to do that, I think >> there's some merit to the stricter GCTimer code, and the slight >> separation of concern in GCTraceTime. >> >> Tested tier1-3 >> >> Thanks, >> StefanK > From stefan.karlsson at oracle.com Thu Mar 26 09:55:37 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 26 Mar 2020 10:55:37 +0100 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: <31bcbdf5-6b20-502d-5f91-8bd18962985d@redhat.com> References: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> <31bcbdf5-6b20-502d-5f91-8bd18962985d@redhat.com> Message-ID: On 2020-03-26 10:54, Roman Kennke wrote: > Hey Stefan, > > Sorry, this went under my radar. Give us half a day or so, yes? Sure. StefanK > > Thanks, > Roman > >> Shenandoah devs, any comments w.r.t. to the Shenandoah section below? >> >> Thanks, >> StefanK >> >> On 2020-03-19 10:44, Stefan Karlsson wrote: >>> Hi all, >>> >>> Please review this patch to rewrite the GCTimer, and associated >>> classes, to not allow nested phases of different types (pause or >>> concurrent). >>> >>> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >>> https://bugs.openjdk.java.net/browse/JDK-8241160 >>> >>> A bug was found when I was looking at JFR events from ZGC. A >>> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The >>> only valid parent is a GCPhasePause event. The reason why this >>> happened was that the we use a GCTraceTime class inside the class >>> unloading code. Previously, we only used GCTraceTimes inside pauses, >>> but ever since class unloading was moved out to a concurrent phase, >>> this isn't true anymore. GCTraceTime used >>> GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and >>> therefore always reported pauses and pause sub-phases. >>> >>> With this patch, I suggest that we become stricter in our usages of >>> the GCTimer. The effects of the patch are: >>> >>> 1) When a top-level pause (or concurrent) phase is created, the code >>> must be explicit about what type of phase is created. The code will >>> now assert if this is abused. Most places were already explicit, but I >>> had to change two places: >>> >>> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have >>> access to register_gc_pause_start. I made that function public, >>> instead of protected, so that we didn't have to deal with that problem. >>> >>> b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in >>> VM_G1Concurrent). This is the only place that uses GCTraceTime to >>> start a pause. All other places use GCTraceTime to create sub-phases. >>> I could have copy-n-pasted the entire >>> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and >>> create a version that calls register_gc_pause_start instead of >>> register_gc_phase_start. Instead of doing that I opted for creating a >>> system where the code code register a set of callbacks to be called >>> when the start and end time is registered. This is used in the backend >>> of GCTraceTime, but then also used by G1 to allow us to not have to >>> copy-n-paste a lot of the code. >>> >>> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic >>> to the default callbacks (unfied logging and GCTimer) but couldn't >>> find a nice way to express that, because of the way we macro-expand >>> the UL tags. Maybe something we can consider for a future investigation. >>> >>> 2) sub-phases now inherit the type from the parent phase, and there's >>> no possibility to incorrectly nest phases anymore. This also removed >>> the need for ConcurrentGCTimer::_is_concurrent_phase_active. >>> >>> 3) This allows (and encourages concurrent sub-phases). When the JFR >>> events were ported to HotSpot, only pauses got sub-phases, because >>> there wasn't a big need for concurrent sub-phases. In this patch I >>> added level of sub-phases to JFR. Maybe it would be better to add more >>> right away? (I'm not a fan of having the explicit sub-phase level >>> events, instead of a counter in *the* phase event, but the JMC team at >>> that time needed it to be logged as separate events. Maybe something >>> that could be reconsidered some time) >>> >>> 4) The different consumers of the timestamps are separated into their >>> own classes. >>> >>> 5) Shenandoah devs need to consider what to do about this change: >>> >>> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer()); >>> + // FIXME: This turns off the previously broken JFR events. If we >>> want to keep reporting them, >>> + // but with the correct type (Concurrent) then a top-level >>> concurrent phase is required. >>> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer >>> */); >>> >>> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to >>> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log >>> sub-phases, but the current Shenandoah code hasn't registered a >>> top-level phase at this point. Either we keep this code with the >>> removal of the gc_timer argument, or we add a top-level phase >>> somewhere. If we want the latter, then I need suggestions on where to >>> add them. Or maybe push the current code, and fix it as a follow-up >>> patch? >>> >>> What do you think? An alternative is to (continue?) completely forbid >>> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes >>> during concurrent phases. Even if we decide to do that, I think >>> there's some merit to the stricter GCTimer code, and the slight >>> separation of concern in GCTraceTime. >>> >>> Tested tier1-3 >>> >>> Thanks, >>> StefanK From per.liden at oracle.com Thu Mar 26 11:01:24 2020 From: per.liden at oracle.com (Per Liden) Date: Thu, 26 Mar 2020 12:01:24 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> Message-ID: <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> Hi, On 3/23/20 10:30 AM, Stefan Karlsson wrote: [...] >> >> * I think cl->_flushed user here: >> >> ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); >> >> should instead just be: >> >> ?604?? event.commit(cl->_requested, flushed, for_allocation); >> >> Right? > > I intentionally used cl->_flushed since that describes how much we > flushed including overflushed parts of pages. Maybe we should report > both values? Maybe also rename the local variable flushed to destroyed? Hmm, not sure I see the point of reporting anything except what was actually flushed. When would the other numbers be of interest? Keep in mind that the overflushed part of this is immediately put back into the cache, and is never unmapped/destroyed or anything like that. Outside of flush_cache() no one will know (or care) if we overflushed or not, right? > >> >> >> src/hotspot/share/gc/z/zPageCache.hpp >> ------------------------------------- >> >> Instead of: >> >> ? friend class ZPageAllocator; >> >> add a getter for requested()? >> > > I also want _flushed, depending on the resolution of the above. I don't > think its bad to friend our closures that are pure extensions to the > "owning" class. I don't have a very strong opinion here, but gravitated > towards a friend declaration to minimize the exposure of the > implementation details. If you still want me to add getters, I'll do it. In this case, I'd prefer getters. Assuming my comment above is accepted, we only need one new getter. > >> >> src/hotspot/share/gc/z/zRelocationSetSelector.cpp >> ------------------------------------------------- >> >> * Same here, instead of: >> >> ? #include "jfrfiles/jfrEventClasses.hpp" >> >> I think we should do: >> >> ? #include "jfr/jfrEvents.hpp" > > Yes > >> >> >> * You don't think we should use ZPageTypeType that you introduced, and >> send three different ZRelocationSet events, one for each page type? >> Shouldn't this event also be timed, and sent from within >> ZRelocationSetSelectorGroup::select()? > > JMC is not always great at handling normalized events. If we want events > per type I think we should add them in _addition_ to the event I added. Ok, I'm sure you're right but still want to understand. When you say "normalized events", what are you thinking of in this context? cheers, Per > >> >> >> src/hotspot/share/gc/z/zTracer.cpp >> ---------------------------------- >> >> ? 43???? writer.write("small"); >> ? 44???? writer.write_key(ZPageTypeMedium); >> ? 45???? writer.write("medium"); >> ? 46???? writer.write_key(ZPageTypeLarge); >> ? 47???? writer.write("large"); >> >> How about "Small", "Medium" and "Large"? I could only find one other >> place (in jfrStackTraceRepository.cpp) where names were given, and >> those start with a capital letter. > > OK > > Here's the updated webrevs with the easy fixes: > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.02 > > Waiting for answers and comments to the rest. > > Thanks, > StefanK > >> >> cheers, >> Per >> >> >>> >>> Added events: >>> >>> ZAllocationStall - Record when we run out of heap memory and the Java >>> threads stall, waiting for the GC to free up memory. >>> >>> ZPageAllocation - Updated the existing event to also record the >>> duration of the event. Updated the event to only be reported if the >>> allocation takes longer than 1 ms. >>> >>> ZPageCacheFlush - Record when the page cache needs to be flushed. >>> This usually happens when we run out of a specific page size and have >>> to detach the physical and virtual memory to materialize a new ZPage. >>> We also flush pages when we uncommit memory. >>> >>> ZRelocationSet - Record information about the selected relocation set. >>> >>> ZUncommit - Record when we uncommit and hand back memory to the OS. >>> >>> The patch also contains some small cosmetic changes to existing >>> events, whitespace fixes. > From shade at redhat.com Thu Mar 26 12:03:42 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 13:03:42 +0100 Subject: RFR (S) 8241668: Shenandoah: make ShenandoahHeapRegion not derive from ContiguousSpace Message-ID: <526e97cc-db3d-206a-573a-fad85b592703@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241668 This is similar to what G1 did in the relevant cleanup (JDK-8189737). There is no reason to carry the cruft from ContiguousSpace superclass, when we just need to pull down a few fields and auxiliary methods from there. sizeof(ShenandoahHeapRegion) went down from 328 to 264 bytes. Webrev: https://cr.openjdk.java.net/~shade/8241668/webrev.01/ Testing: hotspot_gc_shenandoah, serviceability/sa with Shenandoah -- Thanks, -Aleksey From erik.osterlund at oracle.com Thu Mar 26 12:20:21 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 26 Mar 2020 13:20:21 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> Message-ID: <01180752-4ec9-5777-d35a-5829472cb890@oracle.com> Hi Roman, First of all, interesting idea. Thanks for sharing your thoughts. I would however like to share some conceptual concerns. The patch has some flaws that needs further consideration. The core assumption that is wrong is that if there are no stores/returns exposing the referent, then it is safe to elide the keep-alive barrier. This is not true. Listing a selection of concerns/bugs here: 1) What the compiler proves can't happen, can always suddenly happen, due to class redefinition. So any idea utilizing compiler proof of what paths can be taken, always need to be backed up by fall-back paths when the optimization eventually gets proven wrong. Let's have a look at this trivial example: void A(List references) { ? for (var ref: references) { ??? Object local = ref.get(); ??? for (int i = 0; i < 10000; ++i) ( ????? B(local); ????? // safepoint poll with local being live ??? } ? } } void B(Object referent) { ? if (referent == System.out) { ??? System.out.println("I like fish. They swim. Blub blub."); ? } } Assume that we compile A, inlining B. The compiler can now prove that the referent doesn't escape. However, we then hit a safepoint in A, and in that safepoint, B is redefined to do this: void B(Object referent) { ? globalList.add(referent); } What will happen now is that the nmethod of A (and B inlined) gets deoptimized. Once execution starts, the deopt handler will run, transferring the state (including the local) over to the interpreter, and in the interpreted version of B, the referent illegally escapes, eventually causing heap corruption, as subsequent remarks will not see the escaped local. 2) Even without class redefinition changing the logic that the compiler proved had no escaping referents through stores, stack walkers can expose locals at any safepoint poll. So in the above example, a stack walker can expose the local to other Java code, which attaches it to the object graph, without ever marking it, again causing heap corruption. Because of this, stack walkers and deoptimization logic needs to keep everything they see in the stack alive, or things will break in subtle ways that are not fun to debug. 3) The analysis needs to also consider not just exposing of the referent, but exposure of anything transitively reachable from the object graph under the referent. If you load the referent, which is a wrapper object, then load the contents of the wrapper, and expose that to the object graph, then that will similarly break, even though the referent itself was never exposed. I can't see that your analysis catches that (LoadP with an AddP with the referent as base, and transitively performing the analysis on that) 4) Even then, there are more problems, involving class unloading. It is not just the object (and its transitive closure) you need to consider w.r.t. whether to keep it alive or not, but implicitly transitively also its class, and the classes of everything transitively reachable from the referent, which is now ignored. For example, if someone calls referent.(chain_of_objects).getClass(), then you can expose classes that have not been kept alive, as your strong native barriers do not keep things alive, AFAICT. 5) Classes are implicitly used when code is run from its class holder, and there are a few ways in which this could go wrong. The nmethod entry barriers shields you against many of them (including speculative inlining of bytecodes from otherwise not kept alive Klass), but not all. For example, if such a transitive class under the referent is acquired, and passed as an argument to the interpreter, it can reflectively call methods without holding its class alive, causing various forms of failure modes crashing the JVM. 6) Sometimes the use of classes is even more subtle and implicit. Here is one example I can think of: Imagine that your referent performs a virtual call. It is found throuch CHA that there exists a single unique selected method (callee). Then it is inlined straight into the code, without any conditional checking that the klass pointer was right. An entry is created into the dependency context. But the class tracked in the dependency context is not the selected class (i.e. the one you inlined), but the base class, because that is the class that will get poked during class loading, to check when the CHA assumption fails in the future. So the dependency context does not track the callee method(). The oop finalization will not find any explicit mentions of this klass. It is just implicitly used. The only scenario in which we explicitly track inlined callees classes, is with the following code for class redefinition: ? // Always register dependence if JVMTI is enabled, because ? // either breakpoint setting or hotswapping of methods may ? // cause deoptimization. ? if (C->env()->jvmti_can_hotswap_or_post_breakpoint()) { ???? C->dependencies()->assert_evol_method(method()); ? } This dependency will create a metadata entry in the nmethod which is caught by the oop finalizer, and hence by your nmethod entry barriers. When jvmti_can_hotswap_or_post_breakpoint() is disabled, the callee holder is not tracked anywhere, because it is assumed that the receiver of such an inlined method call is live, which seems like a reasonable assumption. Now that this elision of keep-alive barriers is used, we enter a gateway to hell. In bytecodes of this not kept alive class loader, you can do lots of fun things, like allocating an object of a classin that class loader, or materialize constants from its static finals, that will not be caught by the GC.This results in objects with broken class pointers being allocated, and dead objects being attached to the object graph. From there on, it is game over. 7) I can imagine code involving method handles going wrong too. You can perform method handle calls through a referent, and the whole method handle call may get inlined. It tracks the holder through a MemberName. It is assumed that if you have a reference to the MemberName, it has a reference to the class holder, and hence it should be safe to execute its bytecodes. But when we start relaxing things such that this has not been kept alive, I can imagine such code blowing up in unexpected and obscure ways. It looks like nothing transitively reachable from the referent is being exposed in any way, and it looks like there are no calls, but inlined use of bytecodes of reachable metadata is an implicit side effect that exposes not the referent but just references to its transitively reachable class loaders, without there ever being an obvious store or side effect involving the referent or its transitive closure of objects or uses. This aside, I think there are many other fun assumptions made over the years about things being live, not listed above, implyingvarious subtle assumptions in our compilers. I would be quite wary of messing with that without a very careful lookinto all the things that can go wrong.Catching such bugs will be a nightmare. So given these various issues, are you still sure you want to continue down this path? If I were you, I think I would probably not perform this change.It gives me a migraine to think about the things that can go wrong, and I feel like I have just scratched the surface of it. I would be quite afraid of destabilizing. /Erik On 2020-03-25 18:29, Roman Kennke wrote: > Shenandoah uses SATB for concurrent marking, and thus needs to track > stores and explicitely make previous values grey. An exception to this > are (soft/weak/final) references: the reachability of referents can be > changed (from weak to strong) when a referent is accessed. For this > reason we require a keep-alive barrier which also makes referents grey > whenever they are accessed via Reference.get(). The downside of this is > if a workload churns weak-references (i.e. accesses them often) they > might never get a chance to be reclaimed. > > The key insight for improving the situation is that the change of > reachability of the referent is only relevant when that referent is > stored somewhere else. We can elide the keep-alive barrier, if the > compiler can prove that this doesn't happen. (We also need to check if > the referent leaves the scope of the method via method-call or return, > because it may be stored outside of the method.) The caveat here is that > we must do another scan of the threads' stacks at final-mark to catch > referents that are in a local variable at final-mark - we must not loose > those. However, according to my measurements, this is only a minor (if > any) regression in final-mark-latency. > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8241605 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ > > Testing: hotspot_gc_shenandoah (including new testcase that verifies the > improved behaviour), manual tests, specjbb and specjvm runs > > I'd like to push this to shenandoah/jdk first, to give it a few more > rounds of testing before hitting mainline JDK. > > Ok? > > Roman > From stefan.karlsson at oracle.com Thu Mar 26 12:30:48 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 26 Mar 2020 13:30:48 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> Message-ID: <0fe119b6-8479-2ba3-a653-a67b62f2d0a6@oracle.com> On 2020-03-26 12:01, Per Liden wrote: > Hi, > > On 3/23/20 10:30 AM, Stefan Karlsson wrote: > [...] >>> >>> * I think cl->_flushed user here: >>> >>> ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); >>> >>> should instead just be: >>> >>> ?604?? event.commit(cl->_requested, flushed, for_allocation); >>> >>> Right? >> >> I intentionally used cl->_flushed since that describes how much we >> flushed including overflushed parts of pages. Maybe we should report >> both values? Maybe also rename the local variable flushed to destroyed? > > Hmm, not sure I see the point of reporting anything except what was > actually flushed. When would the other numbers be of interest? Keep in > mind that the overflushed part of this is immediately put back into > the cache, and is never unmapped/destroyed or anything like that. > Outside of flush_cache() no one will know (or care) if we overflushed > or not, right? Overflushing causes malloc calls for pages and bitmaps. I thought that could be of interest when looking at latencies. If you don't want it, I'll remove it. > >> >>> >>> >>> src/hotspot/share/gc/z/zPageCache.hpp >>> ------------------------------------- >>> >>> Instead of: >>> >>> ? friend class ZPageAllocator; >>> >>> add a getter for requested()? >>> >> >> I also want _flushed, depending on the resolution of the above. I >> don't think its bad to friend our closures that are pure extensions >> to the "owning" class. I don't have a very strong opinion here, but >> gravitated towards a friend declaration to minimize the exposure of >> the implementation details. If you still want me to add getters, I'll >> do it. > > In this case, I'd prefer getters. Assuming my comment above is > accepted, we only need one new getter. OK. > >> >>> >>> src/hotspot/share/gc/z/zRelocationSetSelector.cpp >>> ------------------------------------------------- >>> >>> * Same here, instead of: >>> >>> ? #include "jfrfiles/jfrEventClasses.hpp" >>> >>> I think we should do: >>> >>> ? #include "jfr/jfrEvents.hpp" >> >> Yes >> >>> >>> >>> * You don't think we should use ZPageTypeType that you introduced, >>> and send three different ZRelocationSet events, one for each page >>> type? Shouldn't this event also be timed, and sent from within >>> ZRelocationSetSelectorGroup::select()? >> >> JMC is not always great at handling normalized events. If we want >> events per type I think we should add them in _addition_ to the event >> I added. > > Ok, I'm sure you're right but still want to understand. When you say > "normalized events", what are you thinking of in this context? What I'm meaning is that if you have the minimally sufficient information like: - Relocated bytes in small pages: x - Relocated bytes in medium pages: y - Relocated bytes in large pages: z You need to go through extra lengths to figure out: - Relocated total bytes: x + y + x and unless you want to fiddle around with JMC too much, it's better to also (or instead) reported the sum of the values. StefanK > > cheers, > Per > >> >>> >>> >>> src/hotspot/share/gc/z/zTracer.cpp >>> ---------------------------------- >>> >>> ? 43???? writer.write("small"); >>> ? 44???? writer.write_key(ZPageTypeMedium); >>> ? 45???? writer.write("medium"); >>> ? 46???? writer.write_key(ZPageTypeLarge); >>> ? 47???? writer.write("large"); >>> >>> How about "Small", "Medium" and "Large"? I could only find one other >>> place (in jfrStackTraceRepository.cpp) where names were given, and >>> those start with a capital letter. >> >> OK >> >> Here's the updated webrevs with the easy fixes: >> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ >> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02 >> >> Waiting for answers and comments to the rest. >> >> Thanks, >> StefanK >> >>> >>> cheers, >>> Per >>> >>> >>>> >>>> Added events: >>>> >>>> ZAllocationStall - Record when we run out of heap memory and the >>>> Java threads stall, waiting for the GC to free up memory. >>>> >>>> ZPageAllocation - Updated the existing event to also record the >>>> duration of the event. Updated the event to only be reported if the >>>> allocation takes longer than 1 ms. >>>> >>>> ZPageCacheFlush - Record when the page cache needs to be flushed. >>>> This usually happens when we run out of a specific page size and >>>> have to detach the physical and virtual memory to materialize a new >>>> ZPage. We also flush pages when we uncommit memory. >>>> >>>> ZRelocationSet - Record information about the selected relocation set. >>>> >>>> ZUncommit - Record when we uncommit and hand back memory to the OS. >>>> >>>> The patch also contains some small cosmetic changes to existing >>>> events, whitespace fixes. >> From stefan.johansson at oracle.com Thu Mar 26 13:07:10 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 26 Mar 2020 14:07:10 +0100 Subject: RFR: 8241666: Enhance log messages in ReferenceProcessor Message-ID: <88d789a4-d62b-629c-0ed8-f1ecda814554@oracle.com> Hi, Please review this change to improve some logging messages. Webrev: http://cr.openjdk.java.net/~sjohanss/8241666/00/ Issue: https://bugs.openjdk.java.net/browse/JDK-8241666 Summary The old logging messages were a bit unclear and when checking if the tests needed to be updated I found some unused static members that I removed. Testing Locally run GC tests and mach5 tier1 and tier2. Thanks, Stefan From rkennke at redhat.com Thu Mar 26 13:49:57 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 14:49:57 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: <01180752-4ec9-5777-d35a-5829472cb890@oracle.com> References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> <01180752-4ec9-5777-d35a-5829472cb890@oracle.com> Message-ID: Hi Erik, Thank you for sharing your thoughts. You are right. Some isses can (and already are) addressed, for example the barriers in debugInfo.cpp take care of exposure via deoptimization. I haven't checked the stackwalker, but I think it can be taken care of in a similar fashion. The main problem is indeed the exposure of objects that are transitively reachable from the referent (incl. classes). This can not be easily fixed. I'll withdraw the RFR. Thank you! Roman > Hi Roman, > > First of all, interesting idea. Thanks for sharing your thoughts. I > would however like to share > some conceptual concerns. The patch has some flaws that needs further > consideration. > The core assumption that is wrong is that if there are no stores/returns > exposing the referent, > then it is safe to elide the keep-alive barrier. This is not true. > > Listing a selection of concerns/bugs here: > > 1) What the compiler proves can't happen, can always suddenly happen, > due to class redefinition. > So any idea utilizing compiler proof of what paths can be taken, always > need to be backed up > by fall-back paths when the optimization eventually gets proven wrong. > Let's have a look at > this trivial example: > > void A(List references) { > ? for (var ref: references) { > ??? Object local = ref.get(); > ??? for (int i = 0; i < 10000; ++i) ( > ????? B(local); > ????? // safepoint poll with local being live > ??? } > ? } > } > > void B(Object referent) { > ? if (referent == System.out) { > ??? System.out.println("I like fish. They swim. Blub blub."); > ? } > } > > Assume that we compile A, inlining B. The compiler can now prove that > the referent doesn't escape. > However, we then hit a safepoint in A, and in that safepoint, B is > redefined to do this: > > void B(Object referent) { > ? globalList.add(referent); > } > > What will happen now is that the nmethod of A (and B inlined) gets > deoptimized. Once execution > starts, the deopt handler will run, transferring the state (including > the local) over to the > interpreter, and in the interpreted version of B, the referent illegally > escapes, eventually > causing heap corruption, as subsequent remarks will not see the escaped > local. > > 2) Even without class redefinition changing the logic that the compiler > proved had no escaping > referents through stores, stack walkers can expose locals at any > safepoint poll. So in the above > example, a stack walker can expose the local to other Java code, which > attaches it to the object > graph, without ever marking it, again causing heap corruption. > > Because of this, stack walkers and deoptimization logic needs to keep > everything they see in the > stack alive, or things will break in subtle ways that are not fun to debug. > > 3) The analysis needs to also consider not just exposing of the > referent, but exposure of anything > transitively reachable from the object graph under the referent. If you > load the referent, which is a > wrapper object, then load the contents of the wrapper, and expose that > to the object graph, then that > will similarly break, even though the referent itself was never exposed. > I can't see that your analysis > catches that (LoadP with an AddP with the referent as base, and > transitively performing the analysis on that) > > 4) Even then, there are more problems, involving class unloading. It is > not just the object (and its > transitive closure) you need to consider w.r.t. whether to keep it alive > or not, but implicitly > transitively also its class, and the classes of everything transitively > reachable from the referent, > which is now ignored. For example, if someone calls > referent.(chain_of_objects).getClass(), then you can > expose classes that have not been kept alive, as your strong native > barriers do not keep things alive, AFAICT. > > 5) Classes are implicitly used when code is run from its class holder, > and there are a few ways in > which this could go wrong. The nmethod entry barriers shields you > against many of them (including > speculative inlining of bytecodes from otherwise not kept alive Klass), > but not all. > For example, if such a transitive class under the referent is acquired, > and passed as an argument > to the interpreter, it can reflectively call methods without holding its > class alive, causing various > forms of failure modes crashing the JVM. > > 6) Sometimes the use of classes is even more subtle and implicit. Here > is one example I can think of: > Imagine that your referent performs a virtual call. It is found throuch > CHA that there exists a single > unique selected method (callee). Then it is inlined straight into the > code, without any conditional checking > that the klass pointer was right. An entry is created into the > dependency context. But the class tracked > in the dependency context is not the selected class (i.e. the one you > inlined), but the base class, > because that is the class that will get poked during class loading, to > check when the CHA assumption > fails in the future. So the dependency context does not track the callee > method(). The oop finalization > will not find any explicit mentions of this klass. It is just implicitly > used. The only scenario in which > we explicitly track inlined callees classes, is with the following code > for class redefinition: > > ? // Always register dependence if JVMTI is enabled, because > ? // either breakpoint setting or hotswapping of methods may > ? // cause deoptimization. > ? if (C->env()->jvmti_can_hotswap_or_post_breakpoint()) { > ???? C->dependencies()->assert_evol_method(method()); > ? } > > This dependency will create a metadata entry in the nmethod which is > caught by the oop finalizer, and > hence by your nmethod entry barriers. When > jvmti_can_hotswap_or_post_breakpoint() is disabled, the callee holder > is not tracked anywhere, because it is assumed that the receiver of such > an inlined method call is live, > which seems like a reasonable assumption. Now that this elision of > keep-alive barriers is used, we enter a > gateway to hell. In bytecodes of this not kept alive class loader, you > can do lots of fun things, like > allocating an object of a classin that class loader, or materialize > constants from its static finals, > that will not be caught by the GC.This results in objects with broken > class pointers being allocated, > and dead objects being attached to the object graph. From there on, it > is game over. > > 7) I can imagine code involving method handles going wrong too. You can > perform method handle calls through > a referent, and the whole method handle call may get inlined. It tracks > the holder through a MemberName. > It is assumed that if you have a reference to the MemberName, it has a > reference to the class holder, and > hence it should be safe to execute its bytecodes. But when we start > relaxing things such that this has not > been kept alive, I can imagine such code blowing up in unexpected and > obscure ways. It looks like nothing > transitively reachable from the referent is being exposed in any way, > and it looks like there are no calls, > but inlined use of bytecodes of reachable metadata is an implicit side > effect that exposes not the referent > but just references to its transitively reachable class loaders, without > there ever being an obvious store > or side effect involving the referent or its transitive closure of > objects or uses. > > This aside, I think there are many other fun assumptions made over the > years about things being live, not > listed above, implyingvarious subtle assumptions in our compilers. I > would be quite wary of messing with > that without a very careful lookinto all the things that can go > wrong.Catching such bugs will be a nightmare. > > So given these various issues, are you still sure you want to continue > down this path? If I were you, I > think I would probably not perform this change.It gives me a migraine to > think about the things that can > go wrong, and I feel like I have just scratched the surface of it. I > would be quite afraid of destabilizing. > > /Erik > > On 2020-03-25 18:29, Roman Kennke wrote: >> Shenandoah uses SATB for concurrent marking, and thus needs to track >> stores and explicitely make previous values grey. An exception to this >> are (soft/weak/final) references: the reachability of referents can be >> changed (from weak to strong) when a referent is accessed. For this >> reason we require a keep-alive barrier which also makes referents grey >> whenever they are accessed via Reference.get(). The downside of this is >> if a workload churns weak-references (i.e. accesses them often) they >> might never get a chance to be reclaimed. >> >> The key insight for improving the situation is that the change of >> reachability of the referent is only relevant when that referent is >> stored somewhere else. We can elide the keep-alive barrier, if the >> compiler can prove that this doesn't happen. (We also need to check if >> the referent leaves the scope of the method via method-call or return, >> because it may be stored outside of the method.) The caveat here is that >> we must do another scan of the threads' stacks at final-mark to catch >> referents that are in a local variable at final-mark - we must not loose >> those. However, according to my measurements, this is only a minor (if >> any) regression in final-mark-latency. >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8241605 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >> >> Testing: hotspot_gc_shenandoah (including new testcase that verifies the >> improved behaviour), manual tests, specjbb and specjvm runs >> >> I'd like to push this to shenandoah/jdk first, to give it a few more >> rounds of testing before hitting mainline JDK. >> >> Ok? >> >> Roman >> > From shade at redhat.com Thu Mar 26 14:25:01 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 15:25:01 +0100 Subject: RFR (S) 8241673: Shenandoah: refactor anti-false-sharing padding Message-ID: <4e8270b6-c068-366d-ccc3-2113535d120f@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241673 Webrev: https://cr.openjdk.java.net/~shade/8241673/webrev.01/ Together with the cleanup, it drops the padding size to 64 bytes, thus optimizing footprint. sizeof(ShenandoahHeapRegion) drops from 264 to 200 bytes. Init/final mark pauses that deal with lots of regions are improving for about 50us each. Testing: hotspot_gc_shenandoah; eyeballing gc-stats from stress runs; benchmarks (running) -- Thanks, -Aleksey From thomas.schatzl at oracle.com Thu Mar 26 14:55:47 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 26 Mar 2020 15:55:47 +0100 Subject: RFR: 8241666: Enhance log messages in ReferenceProcessor In-Reply-To: <88d789a4-d62b-629c-0ed8-f1ecda814554@oracle.com> References: <88d789a4-d62b-629c-0ed8-f1ecda814554@oracle.com> Message-ID: Hi, On 26.03.20 14:07, Stefan Johansson wrote: > Hi, > > Please review this change to improve some logging messages. > > Webrev: http://cr.openjdk.java.net/~sjohanss/8241666/00/ > Issue: https://bugs.openjdk.java.net/browse/JDK-8241666 > > Summary > The old logging messages were a bit unclear and when checking if the > tests needed to be updated I found some unused static members that I > removed. > > Testing > Locally run GC tests and mach5 tier1 and tier2. the only nit I can see is in the new log message in referenceProcessor.cpp:791: "Skipped phase 1 of Reference Processing: no policy to reconsider" I would remove the "to reconsider" part as it seems unnecessary. Feel free to ignore. No need to re-review. Thomas From erik.osterlund at oracle.com Thu Mar 26 15:21:29 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 26 Mar 2020 16:21:29 +0100 Subject: RFR: 8241605: Shenandoah: More aggressive reference discovery In-Reply-To: References: <35ce7d76-02c7-ebc6-a98d-57a14ba35b7a@redhat.com> <01180752-4ec9-5777-d35a-5829472cb890@oracle.com> Message-ID: <3fa7b4d3-5e36-320f-c51d-5cfd98a5354c@oracle.com> Hi Roman, For the record, the barriers in debugInfo.cpp are for locals materialized through nmethod oops. That is not the problem I talked about. It is locals on-stack. You do indeed apply barriers when deoptimizing on said locals. It eventually goes through a shared stack walking API in stackValue.cpp, where we see the following code protecting oops loads from the stack: #if INCLUDE_SHENANDOAHGC ????? if (UseShenandoahGC) { ??????? val = ShenandoahBarrierSet::barrier_set()->load_reference_barrier(val); ????? } #endif This function seems to take care of relocation, but not keeping things alive. I'm guessing it is presumed that things on-stack should already be live. ;) On a related note... note that this also implies that if youdeoptimize between a referent load and its keep-alive barrier, you will expose broken objectsto the object graph, as nothing keeps that alive. This comprises part of the reason we don'twant to allow the compiler to put safepoints between loads and their load barriers. This problembecomes a million times worse when you have concurrent reference processing, as you can't knowthe strength used to load the referent, and depending on the strength, the stack fixup code needsto change the value to different things. Before we switched over to our super late barrierexpansion, there were still a number of scenarios causing safepoints to float in between loads and theirbarriers. I would be concerned about that, because I don't believe anything has changed. It's just very hard to find the bugs. Anyway, thank you for withdrawing the RFR. Thanks, /Erik On 2020-03-26 14:49, Roman Kennke wrote: > Hi Erik, > > Thank you for sharing your thoughts. > > You are right. Some isses can (and already are) addressed, for example > the barriers in debugInfo.cpp take care of exposure via deoptimization. > I haven't checked the stackwalker, but I think it can be taken care of > in a similar fashion. > > The main problem is indeed the exposure of objects that are transitively > reachable from the referent (incl. classes). This can not be easily fixed. > > I'll withdraw the RFR. > > Thank you! > Roman > > >> Hi Roman, >> >> First of all, interesting idea. Thanks for sharing your thoughts. I >> would however like to share >> some conceptual concerns. The patch has some flaws that needs further >> consideration. >> The core assumption that is wrong is that if there are no stores/returns >> exposing the referent, >> then it is safe to elide the keep-alive barrier. This is not true. >> >> Listing a selection of concerns/bugs here: >> >> 1) What the compiler proves can't happen, can always suddenly happen, >> due to class redefinition. >> So any idea utilizing compiler proof of what paths can be taken, always >> need to be backed up >> by fall-back paths when the optimization eventually gets proven wrong. >> Let's have a look at >> this trivial example: >> >> void A(List references) { >> ? for (var ref: references) { >> ??? Object local = ref.get(); >> ??? for (int i = 0; i < 10000; ++i) ( >> ????? B(local); >> ????? // safepoint poll with local being live >> ??? } >> ? } >> } >> >> void B(Object referent) { >> ? if (referent == System.out) { >> ??? System.out.println("I like fish. They swim. Blub blub."); >> ? } >> } >> >> Assume that we compile A, inlining B. The compiler can now prove that >> the referent doesn't escape. >> However, we then hit a safepoint in A, and in that safepoint, B is >> redefined to do this: >> >> void B(Object referent) { >> ? globalList.add(referent); >> } >> >> What will happen now is that the nmethod of A (and B inlined) gets >> deoptimized. Once execution >> starts, the deopt handler will run, transferring the state (including >> the local) over to the >> interpreter, and in the interpreted version of B, the referent illegally >> escapes, eventually >> causing heap corruption, as subsequent remarks will not see the escaped >> local. >> >> 2) Even without class redefinition changing the logic that the compiler >> proved had no escaping >> referents through stores, stack walkers can expose locals at any >> safepoint poll. So in the above >> example, a stack walker can expose the local to other Java code, which >> attaches it to the object >> graph, without ever marking it, again causing heap corruption. >> >> Because of this, stack walkers and deoptimization logic needs to keep >> everything they see in the >> stack alive, or things will break in subtle ways that are not fun to debug. >> >> 3) The analysis needs to also consider not just exposing of the >> referent, but exposure of anything >> transitively reachable from the object graph under the referent. If you >> load the referent, which is a >> wrapper object, then load the contents of the wrapper, and expose that >> to the object graph, then that >> will similarly break, even though the referent itself was never exposed. >> I can't see that your analysis >> catches that (LoadP with an AddP with the referent as base, and >> transitively performing the analysis on that) >> >> 4) Even then, there are more problems, involving class unloading. It is >> not just the object (and its >> transitive closure) you need to consider w.r.t. whether to keep it alive >> or not, but implicitly >> transitively also its class, and the classes of everything transitively >> reachable from the referent, >> which is now ignored. For example, if someone calls >> referent.(chain_of_objects).getClass(), then you can >> expose classes that have not been kept alive, as your strong native >> barriers do not keep things alive, AFAICT. >> >> 5) Classes are implicitly used when code is run from its class holder, >> and there are a few ways in >> which this could go wrong. The nmethod entry barriers shields you >> against many of them (including >> speculative inlining of bytecodes from otherwise not kept alive Klass), >> but not all. >> For example, if such a transitive class under the referent is acquired, >> and passed as an argument >> to the interpreter, it can reflectively call methods without holding its >> class alive, causing various >> forms of failure modes crashing the JVM. >> >> 6) Sometimes the use of classes is even more subtle and implicit. Here >> is one example I can think of: >> Imagine that your referent performs a virtual call. It is found throuch >> CHA that there exists a single >> unique selected method (callee). Then it is inlined straight into the >> code, without any conditional checking >> that the klass pointer was right. An entry is created into the >> dependency context. But the class tracked >> in the dependency context is not the selected class (i.e. the one you >> inlined), but the base class, >> because that is the class that will get poked during class loading, to >> check when the CHA assumption >> fails in the future. So the dependency context does not track the callee >> method(). The oop finalization >> will not find any explicit mentions of this klass. It is just implicitly >> used. The only scenario in which >> we explicitly track inlined callees classes, is with the following code >> for class redefinition: >> >> ? // Always register dependence if JVMTI is enabled, because >> ? // either breakpoint setting or hotswapping of methods may >> ? // cause deoptimization. >> ? if (C->env()->jvmti_can_hotswap_or_post_breakpoint()) { >> ???? C->dependencies()->assert_evol_method(method()); >> ? } >> >> This dependency will create a metadata entry in the nmethod which is >> caught by the oop finalizer, and >> hence by your nmethod entry barriers. When >> jvmti_can_hotswap_or_post_breakpoint() is disabled, the callee holder >> is not tracked anywhere, because it is assumed that the receiver of such >> an inlined method call is live, >> which seems like a reasonable assumption. Now that this elision of >> keep-alive barriers is used, we enter a >> gateway to hell. In bytecodes of this not kept alive class loader, you >> can do lots of fun things, like >> allocating an object of a classin that class loader, or materialize >> constants from its static finals, >> that will not be caught by the GC.This results in objects with broken >> class pointers being allocated, >> and dead objects being attached to the object graph. From there on, it >> is game over. >> >> 7) I can imagine code involving method handles going wrong too. You can >> perform method handle calls through >> a referent, and the whole method handle call may get inlined. It tracks >> the holder through a MemberName. >> It is assumed that if you have a reference to the MemberName, it has a >> reference to the class holder, and >> hence it should be safe to execute its bytecodes. But when we start >> relaxing things such that this has not >> been kept alive, I can imagine such code blowing up in unexpected and >> obscure ways. It looks like nothing >> transitively reachable from the referent is being exposed in any way, >> and it looks like there are no calls, >> but inlined use of bytecodes of reachable metadata is an implicit side >> effect that exposes not the referent >> but just references to its transitively reachable class loaders, without >> there ever being an obvious store >> or side effect involving the referent or its transitive closure of >> objects or uses. >> >> This aside, I think there are many other fun assumptions made over the >> years about things being live, not >> listed above, implyingvarious subtle assumptions in our compilers. I >> would be quite wary of messing with >> that without a very careful lookinto all the things that can go >> wrong.Catching such bugs will be a nightmare. >> >> So given these various issues, are you still sure you want to continue >> down this path? If I were you, I >> think I would probably not perform this change.It gives me a migraine to >> think about the things that can >> go wrong, and I feel like I have just scratched the surface of it. I >> would be quite afraid of destabilizing. >> >> /Erik >> >> On 2020-03-25 18:29, Roman Kennke wrote: >>> Shenandoah uses SATB for concurrent marking, and thus needs to track >>> stores and explicitely make previous values grey. An exception to this >>> are (soft/weak/final) references: the reachability of referents can be >>> changed (from weak to strong) when a referent is accessed. For this >>> reason we require a keep-alive barrier which also makes referents grey >>> whenever they are accessed via Reference.get(). The downside of this is >>> if a workload churns weak-references (i.e. accesses them often) they >>> might never get a chance to be reclaimed. >>> >>> The key insight for improving the situation is that the change of >>> reachability of the referent is only relevant when that referent is >>> stored somewhere else. We can elide the keep-alive barrier, if the >>> compiler can prove that this doesn't happen. (We also need to check if >>> the referent leaves the scope of the method via method-call or return, >>> because it may be stored outside of the method.) The caveat here is that >>> we must do another scan of the threads' stacks at final-mark to catch >>> referents that are in a local variable at final-mark - we must not loose >>> those. However, according to my measurements, this is only a minor (if >>> any) regression in final-mark-latency. >>> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8241605 >>> Webrev: >>> http://cr.openjdk.java.net/~rkennke/JDK-8241605/webrev.00/ >>> >>> Testing: hotspot_gc_shenandoah (including new testcase that verifies the >>> improved behaviour), manual tests, specjbb and specjvm runs >>> >>> I'd like to push this to shenandoah/jdk first, to give it a few more >>> rounds of testing before hitting mainline JDK. >>> >>> Ok? >>> >>> Roman >>> From sangheon.kim at oracle.com Thu Mar 26 15:53:01 2020 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 26 Mar 2020 08:53:01 -0700 Subject: RFR: 8241666: Enhance log messages in ReferenceProcessor In-Reply-To: References: <88d789a4-d62b-629c-0ed8-f1ecda814554@oracle.com> Message-ID: <8613ac74-e14c-dd97-9280-f412a6c0fda4@oracle.com> Hi Stefan, On 3/26/20 7:55 AM, Thomas Schatzl wrote: > Hi, > > On 26.03.20 14:07, Stefan Johansson wrote: >> Hi, >> >> Please review this change to improve some logging messages. >> >> Webrev: http://cr.openjdk.java.net/~sjohanss/8241666/00/ >> Issue: https://bugs.openjdk.java.net/browse/JDK-8241666 >> >> Summary >> The old logging messages were a bit unclear and when checking if the >> tests needed to be updated I found some unused static members that I >> removed. >> >> Testing >> Locally run GC tests and mach5 tier1 and tier2. > > ? the only nit I can see is in the new log message in > referenceProcessor.cpp:791: > > "Skipped phase 1 of Reference Processing: no policy to reconsider" > > I would remove the "to reconsider" part as it seems unnecessary. Feel > free to ignore. No need to re-review. +1 Thanks, Sangheon > > Thomas > From rkennke at redhat.com Thu Mar 26 17:04:59 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 18:04:59 +0100 Subject: RFR (S) 8241668: Shenandoah: make ShenandoahHeapRegion not derive from ContiguousSpace In-Reply-To: <526e97cc-db3d-206a-573a-fad85b592703@redhat.com> References: <526e97cc-db3d-206a-573a-fad85b592703@redhat.com> Message-ID: <044fc10e-e79e-884d-1fc2-fe550d33ee87@redhat.com> Hi Aleksey, Nice cleanup. I don't think we need block_start() block_size() and block_is_obj() (ShHeap and ShHeapRegion) for anything. Correct me if I'm wrong. Other than that, it looks good. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241668 > > This is similar to what G1 did in the relevant cleanup (JDK-8189737). There is no reason to carry > the cruft from ContiguousSpace superclass, when we just need to pull down a few fields and auxiliary > methods from there. > > sizeof(ShenandoahHeapRegion) went down from 328 to 264 bytes. > > Webrev: > https://cr.openjdk.java.net/~shade/8241668/webrev.01/ > > Testing: hotspot_gc_shenandoah, serviceability/sa with Shenandoah > From rkennke at redhat.com Thu Mar 26 17:05:40 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 18:05:40 +0100 Subject: RFR (S) 8241673: Shenandoah: refactor anti-false-sharing padding In-Reply-To: <4e8270b6-c068-366d-ccc3-2113535d120f@redhat.com> References: <4e8270b6-c068-366d-ccc3-2113535d120f@redhat.com> Message-ID: Looks good. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241673 > > Webrev: > https://cr.openjdk.java.net/~shade/8241673/webrev.01/ > > Together with the cleanup, it drops the padding size to 64 bytes, thus optimizing footprint. > sizeof(ShenandoahHeapRegion) drops from 264 to 200 bytes. Init/final mark pauses that deal with lots > of regions are improving for about 50us each. > > Testing: hotspot_gc_shenandoah; eyeballing gc-stats from stress runs; benchmarks (running) > From shade at redhat.com Thu Mar 26 17:12:25 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 18:12:25 +0100 Subject: RFR (S) 8241668: Shenandoah: make ShenandoahHeapRegion not derive from ContiguousSpace In-Reply-To: <044fc10e-e79e-884d-1fc2-fe550d33ee87@redhat.com> References: <526e97cc-db3d-206a-573a-fad85b592703@redhat.com> <044fc10e-e79e-884d-1fc2-fe550d33ee87@redhat.com> Message-ID: <7f8f8279-6797-a423-91f3-084129fee32d@redhat.com> On 3/26/20 6:04 PM, Roman Kennke wrote: > I don't think we need block_start() block_size() and block_is_obj() > (ShHeap and ShHeapRegion) for anything. Correct me if I'm wrong. We do need it: hs_err and verifier printing enters here. It is not obvious with templates, but you can follow it from ShenandoahHeap::print_location. -- Thanks, -Aleksey From rkennke at redhat.com Thu Mar 26 17:31:05 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 18:31:05 +0100 Subject: RFR (S) 8241668: Shenandoah: make ShenandoahHeapRegion not derive from ContiguousSpace In-Reply-To: <7f8f8279-6797-a423-91f3-084129fee32d@redhat.com> References: <526e97cc-db3d-206a-573a-fad85b592703@redhat.com> <044fc10e-e79e-884d-1fc2-fe550d33ee87@redhat.com> <7f8f8279-6797-a423-91f3-084129fee32d@redhat.com> Message-ID: <15ec2c48-4495-7cf1-b11f-fb8f0579ae69@redhat.com> >> I don't think we need block_start() block_size() and block_is_obj() >> (ShHeap and ShHeapRegion) for anything. Correct me if I'm wrong. > > We do need it: hs_err and verifier printing enters here. It is not obvious with templates, but you > can follow it from ShenandoahHeap::print_location. Eh. Ok then. I thought those are relics of when we tried to do heap iteration like serial GC (in ancient history). Good! Thanks, Roman From leonid.mesnik at oracle.com Thu Mar 26 17:46:02 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 26 Mar 2020 10:46:02 -0700 Subject: RFR: 8241478: vmTestbase/gc/gctests/Steal/steal001/steal001.java fails with OOME Message-ID: <17fb3dcf-dd96-9d9f-6b26-844162a52e66@oracle.com> Hi Could you please review following fix which removes vmTestbase/gc/gctests/Steal tests from the repo. These tests might fail provoking OutOfMemoryError throwing it in unexpected place. The logic and intention of these tests are very unclear from their description and code. Test allocate objects till OOME. Than test removes some references ans start allocate objects provoking OOME and corresponding GC. It doesn't clear how it should stress taskqueue work-stealing in GC better than any other test causing a lot of GC. This test however mentioned in a couple of bugs where crashes were caused by OOME in Heap. https://bugs.openjdk.java.net/browse/JDK-8180627 https://bugs.openjdk.java.net/browse/JDK-8130344 Assuming that we already have enough tests stressing OOME and tests stressing GC. I think it makes sense just to remove these two tests. I verified that they are not in any jtreg group. webrev: http://cr.openjdk.java.net/~lmesnik/8241478/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8241478 From shade at redhat.com Thu Mar 26 18:20:52 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Mar 2020 19:20:52 +0100 Subject: RFR (S) 8241692: Shenandoah: remove ShenandoahHeapRegion::_reserved Message-ID: <4fefe5f0-473b-8198-5192-e1de4e597479@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241692 Follow up from JDK-8241668: _reserved field is not actually needed, because we can just use bottom() and end() available. Saves 16 bytes per region. Webrev: https://cr.openjdk.java.net/~shade/8241692/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From zgu at redhat.com Thu Mar 26 18:34:16 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 26 Mar 2020 14:34:16 -0400 Subject: RFR (S) 8241692: Shenandoah: remove ShenandoahHeapRegion::_reserved In-Reply-To: <4fefe5f0-473b-8198-5192-e1de4e597479@redhat.com> References: <4fefe5f0-473b-8198-5192-e1de4e597479@redhat.com> Message-ID: <6c485823-6a43-269f-b9d7-1a4fc13f6be6@redhat.com> Good to me. -Zhengyu On 3/26/20 2:20 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241692 > > Follow up from JDK-8241668: _reserved field is not actually needed, because we can just use bottom() > and end() available. Saves 16 bytes per region. > > Webrev: > https://cr.openjdk.java.net/~shade/8241692/webrev.01/ > > Testing: hotspot_gc_shenandoah > From kim.barrett at oracle.com Thu Mar 26 19:01:08 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 26 Mar 2020 15:01:08 -0400 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: <83dbc1c5-cabc-c414-1ad4-4e6a1b8b5c05@oracle.com> References: <0B438446-208F-4CE3-B3D7-A8E0596E0133@oracle.com> <83dbc1c5-cabc-c414-1ad4-4e6a1b8b5c05@oracle.com> Message-ID: <5C5F81B6-00F2-452C-85BF-4994A0C27084@oracle.com> > On Mar 26, 2020, at 5:34 AM, Stefan Karlsson wrote: > > On 2020-03-25 23:37, Kim Barrett wrote: >> src/hotspot/share/gc/shared/gcTraceSend.cpp >> 306 assert(phase->level() < 2, "There is only two levels for ConcurrentPhase"); >> >> Seems like there ought to be a named constant for that "2", similar to >> PhasesStack::PHASE_LEVELS used in a similar place in visit_pause. But >> then, there's a mismatch between PHASE_LEVELS (6) and the number of >> cases in visit_pause (4). That's "interesting"... > > Yeah. visit_pause checks that we don't push too many levels of phases, but we already check that when we push phases: > > [?] > I think it would make sense to remove that assert, to get rid of that confusion. > > I'm not so sure about removing 'assert(phase->level() < 2' since it's there to to catch when we start to use deeper nesting of the concurrent phases. Maybe if we also add EventGCPhaseConcurrentLevel2-4 (analogous to visit_pause) and then get rid of this assert? > > If we want to go ahead and do any of these changes (find a name for the 2 or adding events) I'll create a separate RFR. Thanks for the walk-through. Your suggestion seems reasonable. From thomas.schatzl at oracle.com Thu Mar 26 19:25:11 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 26 Mar 2020 20:25:11 +0100 Subject: RFR: 8241478: vmTestbase/gc/gctests/Steal/steal001/steal001.java fails with OOME In-Reply-To: <17fb3dcf-dd96-9d9f-6b26-844162a52e66@oracle.com> References: <17fb3dcf-dd96-9d9f-6b26-844162a52e66@oracle.com> Message-ID: <5288c05c-a86c-9adb-3439-88b640603505@oracle.com> Hi, On 26.03.20 18:46, Leonid Mesnik wrote: > Hi > > Could you please review following fix which removes > vmTestbase/gc/gctests/Steal tests from the repo. > > These tests might fail provoking OutOfMemoryError throwing it in > unexpected place. The logic and intention of these tests are very > unclear from their description and code. > > Test allocate objects till OOME. Than test removes some references ans > start allocate objects provoking OOME and corresponding GC. It doesn't > clear how it should stress taskqueue work-stealing in GC better than any > other test causing a lot of GC. > > This test however mentioned in a couple of bugs where crashes were > caused by OOME in Heap. > > https://bugs.openjdk.java.net/browse/JDK-8180627 > > https://bugs.openjdk.java.net/browse/JDK-8130344 Please close these out as duplicates. > > Assuming that we already have enough tests stressing OOME and tests > stressing GC. I think it makes sense just to remove these two tests. Some analysis of work stealing statistics shows that other existing tests stress work stealing more than these two. > > I verified that they are not in any jtreg group. > > webrev: http://cr.openjdk.java.net/~lmesnik/8241478/webrev.00/ > > bug: https://bugs.openjdk.java.net/browse/JDK-8241478 > looks good. Thanks. Thomas From stefan.johansson at oracle.com Thu Mar 26 19:25:37 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 26 Mar 2020 20:25:37 +0100 Subject: RFR: 8241666: Enhance log messages in ReferenceProcessor In-Reply-To: <8613ac74-e14c-dd97-9280-f412a6c0fda4@oracle.com> References: <88d789a4-d62b-629c-0ed8-f1ecda814554@oracle.com> <8613ac74-e14c-dd97-9280-f412a6c0fda4@oracle.com> Message-ID: <2db85f1c-da20-1640-83fe-108463f76337@oracle.com> Thanks for the review guys, On 2020-03-26 16:53, sangheon.kim at oracle.com wrote: > Hi Stefan, > > On 3/26/20 7:55 AM, Thomas Schatzl wrote: >> Hi, >> >> On 26.03.20 14:07, Stefan Johansson wrote: >>> Hi, >>> >>> Please review this change to improve some logging messages. >>> >>> Webrev: http://cr.openjdk.java.net/~sjohanss/8241666/00/ >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8241666 >>> >>> Summary >>> The old logging messages were a bit unclear and when checking if the >>> tests needed to be updated I found some unused static members that I >>> removed. >>> >>> Testing >>> Locally run GC tests and mach5 tier1 and tier2. >> >> ? the only nit I can see is in the new log message in >> referenceProcessor.cpp:791: >> >> "Skipped phase 1 of Reference Processing: no policy to reconsider" >> >> I would remove the "to reconsider" part as it seems unnecessary. Feel >> free to ignore. No need to re-review. > +1 I will remove the "to reconsider" part and push tomorrow unless anyone else comes in and object. Cheers, Stefan > > Thanks, > Sangheon > > >> >> Thomas >> > From kim.barrett at oracle.com Thu Mar 26 19:32:33 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 26 Mar 2020 15:32:33 -0400 Subject: RFR: 8241478: vmTestbase/gc/gctests/Steal/steal001/steal001.java fails with OOME In-Reply-To: <5288c05c-a86c-9adb-3439-88b640603505@oracle.com> References: <17fb3dcf-dd96-9d9f-6b26-844162a52e66@oracle.com> <5288c05c-a86c-9adb-3439-88b640603505@oracle.com> Message-ID: <8AEC87AB-150A-47DA-B3EC-7C99194A6B53@oracle.com> > On Mar 26, 2020, at 3:25 PM, Thomas Schatzl wrote: > > Hi, > > On 26.03.20 18:46, Leonid Mesnik wrote: >> Hi >> Could you please review following fix which removes vmTestbase/gc/gctests/Steal tests from the repo. >> These tests might fail provoking OutOfMemoryError throwing it in unexpected place. The logic and intention of these tests are very unclear from their description and code. >> Test allocate objects till OOME. Than test removes some references ans start allocate objects provoking OOME and corresponding GC. It doesn't clear how it should stress taskqueue work-stealing in GC better than any other test causing a lot of GC. >> This test however mentioned in a couple of bugs where crashes were caused by OOME in Heap. >> https://bugs.openjdk.java.net/browse/JDK-8180627 >> https://bugs.openjdk.java.net/browse/JDK-8130344 > > Please close these out as duplicates. > >> Assuming that we already have enough tests stressing OOME and tests stressing GC. I think it makes sense just to remove these two tests. > > Some analysis of work stealing statistics shows that other existing tests stress work stealing more than these two. > >> I verified that they are not in any jtreg group. >> webrev: http://cr.openjdk.java.net/~lmesnik/8241478/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8241478 > > looks good. Thanks. > > Thomas Looks good to me too. From mark.reinhold at oracle.com Thu Mar 26 19:32:46 2020 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Thu, 26 Mar 2020 12:32:46 -0700 (PDT) Subject: New candidate JEP: 379: Shenandoah: A Low-Pause-Time Garbage Collector (Production) Message-ID: <20200326193246.A6DC131B567@eggemoggin.niobe.net> https://openjdk.java.net/jeps/379 - Mark From rkennke at redhat.com Thu Mar 26 21:01:13 2020 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 26 Mar 2020 22:01:13 +0100 Subject: RFR: 8241700: Shenandoah: Fold ShenandoahKeepAliveBarrier flag into ShenandoahSATBBarrier Message-ID: <65f50d36-95ff-e0ed-88f4-9b657f9015e5@redhat.com> Keep-alive of weak references and similar is strongly bound with SATB. There seems no reason to have two flags. Let's fold them and eliminate one flag. Bug: https://bugs.openjdk.java.net/browse/JDK-8241700 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8241700/webrev.00/ Testing: hotspot_gc_shenandoah Good? Roman From leonid.mesnik at oracle.com Thu Mar 26 21:39:15 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 26 Mar 2020 14:39:15 -0700 Subject: RFR: 8241456: ThreadRunner shouldn't use Wicket for threads starting synchronization In-Reply-To: References: Message-ID: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> Replying with correct summary. Leonid On 3/23/20 8:55 PM, Leonid Mesnik wrote: > Hi > > Could you please review following fix which update ThreadsRunner to use AtomicInteger/spinOnWait instead of Wicket to synchronize starting of stress test threads. > > Failing tests allocated all memory by earlier started threads before Lock.unlock is called in the latest threads. So thread might get an OOME exception while trying to release lock and/or get into inconsistent state. > > The bug was introduced by https://bugs.openjdk.java.net/browse/JDK-8241123 > The Atomic works fine for stress test finishing sync. I just didn't expect that tests might OOME while releasing start lock. > Verified that tests now don't fail with -Xcomp -server -XX:-TieredCompilation -XX:-UseCompressedOops. > > webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8241456 > > Leonid From stumon01 at arm.com Thu Mar 26 22:42:08 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Thu, 26 Mar 2020 22:42:08 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading Message-ID: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Hello, Please review this change to implement nmethod entry barriers on aarch64, and hence concurrent class unloading with ZGC. Shenandoah will need to be separately tested and enabled - there are problems with this on Shenandoah. It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as well as Netbeans. In terms of interesting features: With nmethod entry barriers, immediate oops are removed by: LIR_Assembler::jobject2reg and MacroAssembler::movoop This is to ensure consistency with the entry barrier, as otherwise with an immediate we'd otherwise need an ISB. I've added "-XX:DeoptNMethodBarrierALot". I found this functionality useful in testing as deoptimisation is very infrequent. I've written it as an atomic to avoid it happening too frequently. As it is a new option, I'm not sure whether any more is needed than this review. A new test has been added "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to test GC with that option enabled. BarrierSetAssembler::nmethod_entry_barrier This method emits the barrier code. In internal review it was suggested the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not done this as the BarrierSetNMethod code checks the exact instruction sequence, and I prefer to be explicit. Benchmarking method entry shows an increase of around 6ns with the nmethod entry barrier. The deoptimisation code was contributed by Andrew Haley. The bug: https://bugs.openjdk.java.net/browse/JDK-8216557 The webrev: http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ BR, Stuart IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From david.holmes at oracle.com Thu Mar 26 23:06:53 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Mar 2020 09:06:53 +1000 Subject: RFR: 8241456: ThreadRunner shouldn't use Wicket for threads starting synchronization In-Reply-To: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> References: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> Message-ID: <9502df2b-07d1-b1d2-5e66-fce0eb4ac9d7@oracle.com> Hi Leonid, On 27/03/2020 7:39 am, Leonid Mesnik wrote: > Replying with correct summary. > > Leonid > > On 3/23/20 8:55 PM, Leonid Mesnik wrote: >> Hi >> >> Could you please review following fix which update ThreadsRunner to >> use AtomicInteger/spinOnWait instead of Wicket to synchronize starting >> of stress test threads. >> >> Failing tests allocated all memory by earlier started threads before >> Lock.unlock is called in the latest threads. So thread might get an >> OOME exception while trying to release lock and/or get into >> inconsistent state. You have a bug in Wicket: + try { + lock.lock(); ... + } finally { + lock.unlock(); The lock() has to go outside the try block. That is why you were getting IllegalMonitorStateExceptions when the lock() threw OOME. But the OOME itself is still a problem as it means you can't use any proper synchronizer. I don't like seeing the spin-loops but in this code you may have no choice if memory may already be exhausted. David ----- >> >> The bug was introduced by >> https://bugs.openjdk.java.net/browse/JDK-8241123 >> >> The Atomic works fine for stress test finishing sync. I just didn't >> expect that tests might OOME while releasing start lock. >> Verified that tests now don't fail with -Xcomp -server >> -XX:-TieredCompilation -XX:-UseCompressedOops. >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8241456 >> >> >> Leonid From leonid.mesnik at oracle.com Thu Mar 26 23:16:39 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 26 Mar 2020 16:16:39 -0700 Subject: RFR: 8241456: ThreadRunner shouldn't use Wicket for threads starting synchronization In-Reply-To: <9502df2b-07d1-b1d2-5e66-fce0eb4ac9d7@oracle.com> References: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> <9502df2b-07d1-b1d2-5e66-fce0eb4ac9d7@oracle.com> Message-ID: On 3/26/20 4:06 PM, David Holmes wrote: > Hi Leonid, > > On 27/03/2020 7:39 am, Leonid Mesnik wrote: >> Replying with correct summary. >> >> Leonid >> >> On 3/23/20 8:55 PM, Leonid Mesnik wrote: >>> Hi >>> >>> Could you please review following fix which update ThreadsRunner to >>> use AtomicInteger/spinOnWait instead of Wicket to synchronize >>> starting of stress test threads. >>> >>> Failing tests allocated all memory by earlier started threads before >>> Lock.unlock is called in the latest threads. So thread might get an >>> OOME exception while trying to release lock and/or get into >>> inconsistent state. > > You have a bug in Wicket: > > +??????? try { > +??????????? lock.lock(); > ... > +??????? } finally { > +??????????? lock.unlock(); > > The lock() has to go outside the try block. That is why you were > getting IllegalMonitorStateExceptions when the lock() threw OOME. Thanks for explanation. But anyway, as I understand locks use memory and might be inconsistent if OOME happened. > > But the OOME itself is still a problem as it means you can't use any > proper synchronizer. I don't like seeing the spin-loops but in this > code you may have no choice if memory may already be exhausted. It should be really short spin-loop, test only start thread during this loop and don't do anything more. Also, it is done only once for all stress test. The goal is to start thread completely before heap is exhausted. Leonid > > David > ----- > > >>> >>> The bug was introduced by >>> https://bugs.openjdk.java.net/browse/JDK-8241123 >>> >>> The Atomic works fine for stress test finishing sync. I just didn't >>> expect that tests might OOME while releasing start lock. >>> Verified that tests now don't fail with -Xcomp -server >>> -XX:-TieredCompilation -XX:-UseCompressedOops. >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8241456 >>> >>> >>> Leonid From david.holmes at oracle.com Thu Mar 26 23:29:18 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Mar 2020 09:29:18 +1000 Subject: RFR: 8241456: ThreadRunner shouldn't use Wicket for threads starting synchronization In-Reply-To: References: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> <9502df2b-07d1-b1d2-5e66-fce0eb4ac9d7@oracle.com> Message-ID: <12701240-fd7d-560d-8974-ff0be9cafa7e@oracle.com> On 27/03/2020 9:16 am, Leonid Mesnik wrote: > > On 3/26/20 4:06 PM, David Holmes wrote: >> Hi Leonid, >> >> On 27/03/2020 7:39 am, Leonid Mesnik wrote: >>> Replying with correct summary. >>> >>> Leonid >>> >>> On 3/23/20 8:55 PM, Leonid Mesnik wrote: >>>> Hi >>>> >>>> Could you please review following fix which update ThreadsRunner to >>>> use AtomicInteger/spinOnWait instead of Wicket to synchronize >>>> starting of stress test threads. >>>> >>>> Failing tests allocated all memory by earlier started threads before >>>> Lock.unlock is called in the latest threads. So thread might get an >>>> OOME exception while trying to release lock and/or get into >>>> inconsistent state. >> >> You have a bug in Wicket: >> >> +??????? try { >> +??????????? lock.lock(); >> ... >> +??????? } finally { >> +??????????? lock.unlock(); >> >> The lock() has to go outside the try block. That is why you were >> getting IllegalMonitorStateExceptions when the lock() threw OOME. > Thanks for explanation. But anyway, as I understand locks use memory and > might be inconsistent if OOME happened. They use memory and so lock() can throw OOME, but they are never inconsistent. >> >> But the OOME itself is still a problem as it means you can't use any >> proper synchronizer. I don't like seeing the spin-loops but in this >> code you may have no choice if memory may already be exhausted. > > It should be really short spin-loop, test only start thread during this > loop and don't do anything more. Also, it is done only once for all > stress test. The goal is to start thread completely before heap is > exhausted. Okay. I'm somewhat dubious about making these changes in mainline now just to support loom. I don't see why we need to care about pinning threads in this kind of situation. David > Leonid > >> >> David >> ----- >> >> >>>> >>>> The bug was introduced by >>>> https://bugs.openjdk.java.net/browse/JDK-8241123 >>>> >>>> The Atomic works fine for stress test finishing sync. I just didn't >>>> expect that tests might OOME while releasing start lock. >>>> Verified that tests now don't fail with -Xcomp -server >>>> -XX:-TieredCompilation -XX:-UseCompressedOops. >>>> >>>> webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8241456 >>>> >>>> >>>> Leonid From leonid.mesnik at oracle.com Thu Mar 26 23:41:36 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 26 Mar 2020 16:41:36 -0700 Subject: RFR: 8241456: ThreadRunner shouldn't use Wicket for threads starting synchronization In-Reply-To: <12701240-fd7d-560d-8974-ff0be9cafa7e@oracle.com> References: <412d8c29-7742-a138-dc74-8f07def5eeae@oracle.com> <9502df2b-07d1-b1d2-5e66-fce0eb4ac9d7@oracle.com> <12701240-fd7d-560d-8974-ff0be9cafa7e@oracle.com> Message-ID: <70175e02-2c50-50e7-0646-4fb82be6c768@oracle.com> On 3/26/20 4:29 PM, David Holmes wrote: > On 27/03/2020 9:16 am, Leonid Mesnik wrote: >> >> On 3/26/20 4:06 PM, David Holmes wrote: >>> Hi Leonid, >>> >>> On 27/03/2020 7:39 am, Leonid Mesnik wrote: >>>> Replying with correct summary. >>>> >>>> Leonid >>>> >>>> On 3/23/20 8:55 PM, Leonid Mesnik wrote: >>>>> Hi >>>>> >>>>> Could you please review following fix which update ThreadsRunner >>>>> to use AtomicInteger/spinOnWait instead of Wicket to synchronize >>>>> starting of stress test threads. >>>>> >>>>> Failing tests allocated all memory by earlier started threads >>>>> before Lock.unlock is called in the latest threads. So thread >>>>> might get an OOME exception while trying to release lock and/or >>>>> get into inconsistent state. >>> >>> You have a bug in Wicket: >>> >>> +??????? try { >>> +??????????? lock.lock(); >>> ... >>> +??????? } finally { >>> +??????????? lock.unlock(); >>> >>> The lock() has to go outside the try block. That is why you were >>> getting IllegalMonitorStateExceptions when the lock() threw OOME. >> Thanks for explanation. But anyway, as I understand locks use memory >> and might be inconsistent if OOME happened. > > They use memory and so lock() can throw OOME, but they are never > inconsistent. Ok, I will move lock.lock() outside of try {}. Thanks for explanation. > >>> >>> But the OOME itself is still a problem as it means you can't use any >>> proper synchronizer. I don't like seeing the spin-loops but in this >>> code you may have no choice if memory may already be exhausted. >> >> It should be really short spin-loop, test only start thread during >> this loop and don't do anything more. Also, it is done only once for >> all stress test. The goal is to start thread completely before heap >> is exhausted. > > Okay. I'm somewhat dubious about making these changes in mainline now > just to support loom. I don't see why we need to care about pinning > threads in this kind of situation. The idea is to add some nsk/share stress tests for virtual threads. Basically, there are the same tests as existing (gc, sysdict) but running in virtual threads. And these tests are going to be executed after loom is integrated. And I want to keep the difference as small as possible between mainline and loom. Leonid > > David > >> Leonid >> >>> >>> David >>> ----- >>> >>> >>>>> >>>>> The bug was introduced by >>>>> https://bugs.openjdk.java.net/browse/JDK-8241123 >>>>> >>>>> The Atomic works fine for stress test finishing sync. I just >>>>> didn't expect that tests might OOME while releasing start lock. >>>>> Verified that tests now don't fail with -Xcomp -server >>>>> -XX:-TieredCompilation -XX:-UseCompressedOops. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~lmesnik/8241456/webrev.00/ >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8241456 >>>>> >>>>> >>>>> Leonid From shade at redhat.com Fri Mar 27 07:01:13 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Mar 2020 08:01:13 +0100 Subject: RFR: 8241700: Shenandoah: Fold ShenandoahKeepAliveBarrier flag into ShenandoahSATBBarrier In-Reply-To: <65f50d36-95ff-e0ed-88f4-9b657f9015e5@redhat.com> References: <65f50d36-95ff-e0ed-88f4-9b657f9015e5@redhat.com> Message-ID: <1fe470e5-0f7a-57b8-a16d-d69751ca300f@redhat.com> On 3/26/20 10:01 PM, Roman Kennke wrote: > Keep-alive of weak references and similar is strongly bound with SATB. > There seems no reason to have two flags. Let's fold them and eliminate > one flag. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8241700 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8241700/webrev.00/ Looks good. I am seeing double seeing double Shenandoah Shenandoah in the changeset synopsis: 8241700: Shenandoah: Shenandoah: Fold ShenandoahKeepAliveBarrier flag into ShenandoahSATBBarrier -- Thanks, -Aleksey From stefan.karlsson at oracle.com Fri Mar 27 08:56:45 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 27 Mar 2020 09:56:45 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <0fe119b6-8479-2ba3-a653-a67b62f2d0a6@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> <0fe119b6-8479-2ba3-a653-a67b62f2d0a6@oracle.com> Message-ID: <878ad6d8-d87d-8dc6-449a-926b035f5fdb@oracle.com> Talked to Per and here's the latest changes: https://cr.openjdk.java.net/~stefank/8241361/webrev.03.delta/ https://cr.openjdk.java.net/~stefank/8241361/webrev.03/ ZPageFlush: - Report the logical flushed value and not the transient value that occurs during overflushing. - Don't report "requested". - For allocations, this value is always the same as the flushed value - For uncommit, this value is usually inflated and not interesting ZRelocationSet: - Make the event span the duration of the selection - Remove the add_all function. FTR, I would have preferred this to stay instead of duplicating the summations. ZRelocationSetGroup: - Added an event to report the relocation info per group (small, medium, large [currently not used]) Thanks, StefanK On 2020-03-26 13:30, Stefan Karlsson wrote: > On 2020-03-26 12:01, Per Liden wrote: >> Hi, >> >> On 3/23/20 10:30 AM, Stefan Karlsson wrote: >> [...] >>>> >>>> * I think cl->_flushed user here: >>>> >>>> ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); >>>> >>>> should instead just be: >>>> >>>> ?604?? event.commit(cl->_requested, flushed, for_allocation); >>>> >>>> Right? >>> >>> I intentionally used cl->_flushed since that describes how much we >>> flushed including overflushed parts of pages. Maybe we should report >>> both values? Maybe also rename the local variable flushed to destroyed? >> >> Hmm, not sure I see the point of reporting anything except what was >> actually flushed. When would the other numbers be of interest? Keep in >> mind that the overflushed part of this is immediately put back into >> the cache, and is never unmapped/destroyed or anything like that. >> Outside of flush_cache() no one will know (or care) if we overflushed >> or not, right? > > Overflushing causes malloc calls for pages and bitmaps. I thought that > could be of interest when looking at latencies. If you don't want it, > I'll remove it. > >> >>> >>>> >>>> >>>> src/hotspot/share/gc/z/zPageCache.hpp >>>> ------------------------------------- >>>> >>>> Instead of: >>>> >>>> ? friend class ZPageAllocator; >>>> >>>> add a getter for requested()? >>>> >>> >>> I also want _flushed, depending on the resolution of the above. I >>> don't think its bad to friend our closures that are pure extensions >>> to the "owning" class. I don't have a very strong opinion here, but >>> gravitated towards a friend declaration to minimize the exposure of >>> the implementation details. If you still want me to add getters, I'll >>> do it. >> >> In this case, I'd prefer getters. Assuming my comment above is >> accepted, we only need one new getter. > > OK. > >> >>> >>>> >>>> src/hotspot/share/gc/z/zRelocationSetSelector.cpp >>>> ------------------------------------------------- >>>> >>>> * Same here, instead of: >>>> >>>> ? #include "jfrfiles/jfrEventClasses.hpp" >>>> >>>> I think we should do: >>>> >>>> ? #include "jfr/jfrEvents.hpp" >>> >>> Yes >>> >>>> >>>> >>>> * You don't think we should use ZPageTypeType that you introduced, >>>> and send three different ZRelocationSet events, one for each page >>>> type? Shouldn't this event also be timed, and sent from within >>>> ZRelocationSetSelectorGroup::select()? >>> >>> JMC is not always great at handling normalized events. If we want >>> events per type I think we should add them in _addition_ to the event >>> I added. >> >> Ok, I'm sure you're right but still want to understand. When you say >> "normalized events", what are you thinking of in this context? > > What I'm meaning is that if you have the minimally sufficient > information like: > - Relocated bytes in small pages: x > - Relocated bytes in medium pages: y > - Relocated bytes in large pages: z > > You need to go through extra lengths to figure out: > - Relocated total bytes: x + y + x > > and unless you want to fiddle around with JMC too much, it's better to > also (or instead) reported the sum of the values. > > StefanK > >> >> cheers, >> Per >> >>> >>>> >>>> >>>> src/hotspot/share/gc/z/zTracer.cpp >>>> ---------------------------------- >>>> >>>> ? 43???? writer.write("small"); >>>> ? 44???? writer.write_key(ZPageTypeMedium); >>>> ? 45???? writer.write("medium"); >>>> ? 46???? writer.write_key(ZPageTypeLarge); >>>> ? 47???? writer.write("large"); >>>> >>>> How about "Small", "Medium" and "Large"? I could only find one other >>>> place (in jfrStackTraceRepository.cpp) where names were given, and >>>> those start with a capital letter. >>> >>> OK >>> >>> Here's the updated webrevs with the easy fixes: >>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ >>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02 >>> >>> Waiting for answers and comments to the rest. >>> >>> Thanks, >>> StefanK >>> >>>> >>>> cheers, >>>> Per >>>> >>>> >>>>> >>>>> Added events: >>>>> >>>>> ZAllocationStall - Record when we run out of heap memory and the >>>>> Java threads stall, waiting for the GC to free up memory. >>>>> >>>>> ZPageAllocation - Updated the existing event to also record the >>>>> duration of the event. Updated the event to only be reported if the >>>>> allocation takes longer than 1 ms. >>>>> >>>>> ZPageCacheFlush - Record when the page cache needs to be flushed. >>>>> This usually happens when we run out of a specific page size and >>>>> have to detach the physical and virtual memory to materialize a new >>>>> ZPage. We also flush pages when we uncommit memory. >>>>> >>>>> ZRelocationSet - Record information about the selected relocation set. >>>>> >>>>> ZUncommit - Record when we uncommit and hand back memory to the OS. >>>>> >>>>> The patch also contains some small cosmetic changes to existing >>>>> events, whitespace fixes. >>> > From per.liden at oracle.com Fri Mar 27 09:24:45 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 10:24:45 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <878ad6d8-d87d-8dc6-449a-926b035f5fdb@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> <0fe119b6-8479-2ba3-a653-a67b62f2d0a6@oracle.com> <878ad6d8-d87d-8dc6-449a-926b035f5fdb@oracle.com> Message-ID: On 3/27/20 9:56 AM, Stefan Karlsson wrote: > Talked to Per and here's the latest changes: > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.03.delta/ > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.03/ > > ZPageFlush: > - Report the logical flushed value and not the transient value that > occurs during overflushing. > - Don't report "requested". > ?- For allocations, this value is always the same as the flushed value > ?- For uncommit, this value is usually inflated and not interesting > > ZRelocationSet: > - Make the event span the duration of the selection > - Remove the add_all function. FTR, I would have preferred this to stay > instead of duplicating the summations. > > ZRelocationSetGroup: > - Added an event to report the relocation info per group (small, medium, > large [currently not used]) Thanks for doing those adjustments, looks good! cheers, Per > > Thanks, > StefanK > > On 2020-03-26 13:30, Stefan Karlsson wrote: >> On 2020-03-26 12:01, Per Liden wrote: >>> Hi, >>> >>> On 3/23/20 10:30 AM, Stefan Karlsson wrote: >>> [...] >>>>> >>>>> * I think cl->_flushed user here: >>>>> >>>>> ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); >>>>> >>>>> should instead just be: >>>>> >>>>> ?604?? event.commit(cl->_requested, flushed, for_allocation); >>>>> >>>>> Right? >>>> >>>> I intentionally used cl->_flushed since that describes how much we >>>> flushed including overflushed parts of pages. Maybe we should report >>>> both values? Maybe also rename the local variable flushed to destroyed? >>> >>> Hmm, not sure I see the point of reporting anything except what was >>> actually flushed. When would the other numbers be of interest? Keep >>> in mind that the overflushed part of this is immediately put back >>> into the cache, and is never unmapped/destroyed or anything like >>> that. Outside of flush_cache() no one will know (or care) if we >>> overflushed or not, right? >> >> Overflushing causes malloc calls for pages and bitmaps. I thought that >> could be of interest when looking at latencies. If you don't want it, >> I'll remove it. >> >>> >>>> >>>>> >>>>> >>>>> src/hotspot/share/gc/z/zPageCache.hpp >>>>> ------------------------------------- >>>>> >>>>> Instead of: >>>>> >>>>> ? friend class ZPageAllocator; >>>>> >>>>> add a getter for requested()? >>>>> >>>> >>>> I also want _flushed, depending on the resolution of the above. I >>>> don't think its bad to friend our closures that are pure extensions >>>> to the "owning" class. I don't have a very strong opinion here, but >>>> gravitated towards a friend declaration to minimize the exposure of >>>> the implementation details. If you still want me to add getters, >>>> I'll do it. >>> >>> In this case, I'd prefer getters. Assuming my comment above is >>> accepted, we only need one new getter. >> >> OK. >> >>> >>>> >>>>> >>>>> src/hotspot/share/gc/z/zRelocationSetSelector.cpp >>>>> ------------------------------------------------- >>>>> >>>>> * Same here, instead of: >>>>> >>>>> ? #include "jfrfiles/jfrEventClasses.hpp" >>>>> >>>>> I think we should do: >>>>> >>>>> ? #include "jfr/jfrEvents.hpp" >>>> >>>> Yes >>>> >>>>> >>>>> >>>>> * You don't think we should use ZPageTypeType that you introduced, >>>>> and send three different ZRelocationSet events, one for each page >>>>> type? Shouldn't this event also be timed, and sent from within >>>>> ZRelocationSetSelectorGroup::select()? >>>> >>>> JMC is not always great at handling normalized events. If we want >>>> events per type I think we should add them in _addition_ to the >>>> event I added. >>> >>> Ok, I'm sure you're right but still want to understand. When you say >>> "normalized events", what are you thinking of in this context? >> >> What I'm meaning is that if you have the minimally sufficient >> information like: >> - Relocated bytes in small pages: x >> - Relocated bytes in medium pages: y >> - Relocated bytes in large pages: z >> >> You need to go through extra lengths to figure out: >> - Relocated total bytes: x + y + x >> >> and unless you want to fiddle around with JMC too much, it's better to >> also (or instead) reported the sum of the values. >> >> StefanK >> >>> >>> cheers, >>> Per >>> >>>> >>>>> >>>>> >>>>> src/hotspot/share/gc/z/zTracer.cpp >>>>> ---------------------------------- >>>>> >>>>> ? 43???? writer.write("small"); >>>>> ? 44???? writer.write_key(ZPageTypeMedium); >>>>> ? 45???? writer.write("medium"); >>>>> ? 46???? writer.write_key(ZPageTypeLarge); >>>>> ? 47???? writer.write("large"); >>>>> >>>>> How about "Small", "Medium" and "Large"? I could only find one >>>>> other place (in jfrStackTraceRepository.cpp) where names were >>>>> given, and those start with a capital letter. >>>> >>>> OK >>>> >>>> Here's the updated webrevs with the easy fixes: >>>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ >>>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02 >>>> >>>> Waiting for answers and comments to the rest. >>>> >>>> Thanks, >>>> StefanK >>>> >>>>> >>>>> cheers, >>>>> Per >>>>> >>>>> >>>>>> >>>>>> Added events: >>>>>> >>>>>> ZAllocationStall - Record when we run out of heap memory and the >>>>>> Java threads stall, waiting for the GC to free up memory. >>>>>> >>>>>> ZPageAllocation - Updated the existing event to also record the >>>>>> duration of the event. Updated the event to only be reported if >>>>>> the allocation takes longer than 1 ms. >>>>>> >>>>>> ZPageCacheFlush - Record when the page cache needs to be flushed. >>>>>> This usually happens when we run out of a specific page size and >>>>>> have to detach the physical and virtual memory to materialize a >>>>>> new ZPage. We also flush pages when we uncommit memory. >>>>>> >>>>>> ZRelocationSet - Record information about the selected relocation >>>>>> set. >>>>>> >>>>>> ZUncommit - Record when we uncommit and hand back memory to the OS. >>>>>> >>>>>> The patch also contains some small cosmetic changes to existing >>>>>> events, whitespace fixes. >>>> >> From stefan.karlsson at oracle.com Fri Mar 27 09:29:29 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 27 Mar 2020 10:29:29 +0100 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <4e335f6c-102e-8853-b99d-f422b259508a@oracle.com> <2e95d089-6635-ac82-d3d9-eb809730f0fc@oracle.com> <2fe4bdef-96d1-d4df-3e6a-f58381f862bb@oracle.com> <0fe119b6-8479-2ba3-a653-a67b62f2d0a6@oracle.com> <878ad6d8-d87d-8dc6-449a-926b035f5fdb@oracle.com> Message-ID: Thanks for reviewing and providing some of the changes for webrev.03. StefanK On 2020-03-27 10:24, Per Liden wrote: > On 3/27/20 9:56 AM, Stefan Karlsson wrote: >> Talked to Per and here's the latest changes: >> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.03.delta/ >> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.03/ >> >> ZPageFlush: >> - Report the logical flushed value and not the transient value that >> occurs during overflushing. >> - Don't report "requested". >> ??- For allocations, this value is always the same as the flushed value >> ??- For uncommit, this value is usually inflated and not interesting >> >> ZRelocationSet: >> - Make the event span the duration of the selection >> - Remove the add_all function. FTR, I would have preferred this to >> stay instead of duplicating the summations. >> >> ZRelocationSetGroup: >> - Added an event to report the relocation info per group (small, >> medium, large [currently not used]) > > Thanks for doing those adjustments, looks good! > > cheers, > Per > >> >> Thanks, >> StefanK >> >> On 2020-03-26 13:30, Stefan Karlsson wrote: >>> On 2020-03-26 12:01, Per Liden wrote: >>>> Hi, >>>> >>>> On 3/23/20 10:30 AM, Stefan Karlsson wrote: >>>> [...] >>>>>> >>>>>> * I think cl->_flushed user here: >>>>>> >>>>>> ?604?? event.commit(cl->_requested, cl->_flushed, for_allocation); >>>>>> >>>>>> should instead just be: >>>>>> >>>>>> ?604?? event.commit(cl->_requested, flushed, for_allocation); >>>>>> >>>>>> Right? >>>>> >>>>> I intentionally used cl->_flushed since that describes how much we >>>>> flushed including overflushed parts of pages. Maybe we should >>>>> report both values? Maybe also rename the local variable flushed to >>>>> destroyed? >>>> >>>> Hmm, not sure I see the point of reporting anything except what was >>>> actually flushed. When would the other numbers be of interest? Keep >>>> in mind that the overflushed part of this is immediately put back >>>> into the cache, and is never unmapped/destroyed or anything like >>>> that. Outside of flush_cache() no one will know (or care) if we >>>> overflushed or not, right? >>> >>> Overflushing causes malloc calls for pages and bitmaps. I thought >>> that could be of interest when looking at latencies. If you don't >>> want it, I'll remove it. >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> src/hotspot/share/gc/z/zPageCache.hpp >>>>>> ------------------------------------- >>>>>> >>>>>> Instead of: >>>>>> >>>>>> ? friend class ZPageAllocator; >>>>>> >>>>>> add a getter for requested()? >>>>>> >>>>> >>>>> I also want _flushed, depending on the resolution of the above. I >>>>> don't think its bad to friend our closures that are pure extensions >>>>> to the "owning" class. I don't have a very strong opinion here, but >>>>> gravitated towards a friend declaration to minimize the exposure of >>>>> the implementation details. If you still want me to add getters, >>>>> I'll do it. >>>> >>>> In this case, I'd prefer getters. Assuming my comment above is >>>> accepted, we only need one new getter. >>> >>> OK. >>> >>>> >>>>> >>>>>> >>>>>> src/hotspot/share/gc/z/zRelocationSetSelector.cpp >>>>>> ------------------------------------------------- >>>>>> >>>>>> * Same here, instead of: >>>>>> >>>>>> ? #include "jfrfiles/jfrEventClasses.hpp" >>>>>> >>>>>> I think we should do: >>>>>> >>>>>> ? #include "jfr/jfrEvents.hpp" >>>>> >>>>> Yes >>>>> >>>>>> >>>>>> >>>>>> * You don't think we should use ZPageTypeType that you introduced, >>>>>> and send three different ZRelocationSet events, one for each page >>>>>> type? Shouldn't this event also be timed, and sent from within >>>>>> ZRelocationSetSelectorGroup::select()? >>>>> >>>>> JMC is not always great at handling normalized events. If we want >>>>> events per type I think we should add them in _addition_ to the >>>>> event I added. >>>> >>>> Ok, I'm sure you're right but still want to understand. When you say >>>> "normalized events", what are you thinking of in this context? >>> >>> What I'm meaning is that if you have the minimally sufficient >>> information like: >>> - Relocated bytes in small pages: x >>> - Relocated bytes in medium pages: y >>> - Relocated bytes in large pages: z >>> >>> You need to go through extra lengths to figure out: >>> - Relocated total bytes: x + y + x >>> >>> and unless you want to fiddle around with JMC too much, it's better >>> to also (or instead) reported the sum of the values. >>> >>> StefanK >>> >>>> >>>> cheers, >>>> Per >>>> >>>>> >>>>>> >>>>>> >>>>>> src/hotspot/share/gc/z/zTracer.cpp >>>>>> ---------------------------------- >>>>>> >>>>>> ? 43???? writer.write("small"); >>>>>> ? 44???? writer.write_key(ZPageTypeMedium); >>>>>> ? 45???? writer.write("medium"); >>>>>> ? 46???? writer.write_key(ZPageTypeLarge); >>>>>> ? 47???? writer.write("large"); >>>>>> >>>>>> How about "Small", "Medium" and "Large"? I could only find one >>>>>> other place (in jfrStackTraceRepository.cpp) where names were >>>>>> given, and those start with a capital letter. >>>>> >>>>> OK >>>>> >>>>> Here's the updated webrevs with the easy fixes: >>>>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02.delta/ >>>>> ??https://cr.openjdk.java.net/~stefank/8241361/webrev.02 >>>>> >>>>> Waiting for answers and comments to the rest. >>>>> >>>>> Thanks, >>>>> StefanK >>>>> >>>>>> >>>>>> cheers, >>>>>> Per >>>>>> >>>>>> >>>>>>> >>>>>>> Added events: >>>>>>> >>>>>>> ZAllocationStall - Record when we run out of heap memory and the >>>>>>> Java threads stall, waiting for the GC to free up memory. >>>>>>> >>>>>>> ZPageAllocation - Updated the existing event to also record the >>>>>>> duration of the event. Updated the event to only be reported if >>>>>>> the allocation takes longer than 1 ms. >>>>>>> >>>>>>> ZPageCacheFlush - Record when the page cache needs to be flushed. >>>>>>> This usually happens when we run out of a specific page size and >>>>>>> have to detach the physical and virtual memory to materialize a >>>>>>> new ZPage. We also flush pages when we uncommit memory. >>>>>>> >>>>>>> ZRelocationSet - Record information about the selected relocation >>>>>>> set. >>>>>>> >>>>>>> ZUncommit - Record when we uncommit and hand back memory to the OS. >>>>>>> >>>>>>> The patch also contains some small cosmetic changes to existing >>>>>>> events, whitespace fixes. >>>>> >>> From erik.osterlund at oracle.com Fri Mar 27 09:47:41 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 27 Mar 2020 10:47:41 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> Hi Stuart, Thanks for sorting this out on AArch64. It is nice to see thatyou can implement these barriers on platforms that do not have instruction cache coherency. One small change request: It looks like in C1 you inject the entry barrier right after build_frame is done: ?629?????? build_frame(); ?630?????? { ?631???????? // Insert nmethod entry barrier into frame. ?632???????? BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); ?633???????? bs->nmethod_entry_barrier(_masm); ?634?????? } Unfortunately, this is in the platform independent part of the LIR assembler. In the x86 version we inject it at the very end of build_frame() instead, which is a platform-specific function. The platform-specific function is in the C1 macro assembler file for that platform. We intentionally put it in the platform-specific path as it is a platform-specific feature. Now on x86, the barrier code will be emitted once in build_frame() and once after returning from build_frame, resulting in two nmethod entry barriers, and only the first one will get patched, causing the second one to mostly take slow paths, which isn't necessarily wrong, but will cause regressions. I would propose you just move those lines into the very end of the AArch64-specific part of build_frame(). I don't need to see another webrev for that trivial code motion. This looks good to me. Agan, thanks a lot for fixing this! It will allow me to go forward with concurrent stack scanning on AArch64 as well. Thanks, /Erik On 2020-03-26 23:42, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From per.liden at oracle.com Fri Mar 27 11:36:37 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 12:36:37 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Hi Stuart, Awesome, thanks a lot for doing this! On 3/26/20 11:42 PM, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ I'll leave the aarch64-specific part for others to review. I just have two minor comments on the rest. * May I suggest that we rename DeoptNMethodBarrierALot to DeoptimizeNMethodBarriersALot, to better match -XX:DeoptimizeALot and friends? * The "counter" used should probably be an unsigned type, to avoid any overflow UB. That variable could also move into the scope where it's used. Like: ---------------------------------------------------------- diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp @@ -50,7 +50,6 @@ int BarrierSetNMethod::nmethod_stub_entry_barrier(address* return_address_ptr) { address return_address = *return_address_ptr; CodeBlob* cb = CodeCache::find_blob(return_address); - static volatile int counter=0; assert(cb != NULL, "invariant"); @@ -67,8 +66,9 @@ // Diagnostic option to force deoptimization 1 in 3 times. It is otherwise // a very rare event. - if (DeoptNMethodBarrierALot) { - if (Atomic::add(&counter, 1) % 3 == 0) { + if (DeoptimizeNMethodBarriersALot) { + static volatile uint32_t counter = 0; + if (Atomic::add(&counter, 1u) % 3 == 0) { may_enter = false; } } diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -2489,7 +2489,7 @@ product(bool, UseEmptySlotsInSupers, true, \ "Allow allocating fields in empty slots of super-classes") \ \ - diagnostic(bool, DeoptNMethodBarrierALot, false, \ + diagnostic(bool, DeoptimizeNMethodBarriersALot, false, \ "Make nmethod barriers deoptimise a lot.") \ // Interface macros ---------------------------------------------------------- * Instead of adding a new file for the test, we could just add a new section in the existing test. * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. Like: ---------------------------------------------------------- diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -35,6 +35,18 @@ * @summary Stress ZGC * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC gc.stress.gcbasher.TestGCBasherWithZ 120000 */ + +/* + * @test TestGCBasherDeoptWithZ + * @key gc stress + * @library / + * @requires vm.gc.Z + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC + * -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 + */ + public class TestGCBasherWithZ { public static void main(String[] args) throws IOException { TestGCBasher.main(args); ---------------------------------------------------------- cheers, Per > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From per.liden at oracle.com Fri Mar 27 11:59:42 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 12:59:42 +0100 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Message-ID: Hi again, On 3/27/20 12:36 PM, Per Liden wrote: > Hi Stuart, > > Awesome, thanks a lot for doing this! > > On 3/26/20 11:42 PM, Stuart Monteith wrote: >> Hello, >> ???????? Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> ????????? With nmethod entry barriers,? immediate oops are removed by: >> ???????????????? LIR_Assembler::jobject2reg? and? MacroAssembler::movoop >> ???????? This is to ensure consistency with the entry barrier, as >> otherwise with >> an immediate we'd otherwise need an ISB. >> >> ???????? I've added "-XX:DeoptNMethodBarrierALot". I found this >> functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> ???????? BarrierSetAssembler::nmethod_entry_barrier >> ???????? This method emits the barrier code. In internal review it was >> suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> ???????? Benchmarking method entry shows an increase of around 6ns >> with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> ???????? https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> ???????? http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > I'll leave the aarch64-specific part for others to review. I just have > two minor comments on the rest. > > * May I suggest that we rename DeoptNMethodBarrierALot to > DeoptimizeNMethodBarriersALot, to better match -XX:DeoptimizeALot and > friends? > > * The "counter" used should probably be an unsigned type, to avoid any > overflow UB. That variable could also move into the scope where it's used. > > Like: > > ---------------------------------------------------------- > diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > @@ -50,7 +50,6 @@ > ?int BarrierSetNMethod::nmethod_stub_entry_barrier(address* > return_address_ptr) { > ?? address return_address = *return_address_ptr; > ?? CodeBlob* cb = CodeCache::find_blob(return_address); > -? static volatile int counter=0; > > ?? assert(cb != NULL, "invariant"); > > @@ -67,8 +66,9 @@ > > ?? // Diagnostic option to force deoptimization 1 in 3 times. It is > otherwise > ?? // a very rare event. > -? if (DeoptNMethodBarrierALot) { > -??? if (Atomic::add(&counter, 1) % 3 == 0) { > +? if (DeoptimizeNMethodBarriersALot) { > +??? static volatile uint32_t counter = 0; > +??? if (Atomic::add(&counter, 1u) % 3 == 0) { > ?????? may_enter = false; > ???? } > ?? } > diff --git a/src/hotspot/share/runtime/globals.hpp > b/src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -2489,7 +2489,7 @@ > ?? product(bool, UseEmptySlotsInSupers, true, ???? \ > ???????????????? "Allow allocating fields in empty slots of > super-classes")? \ > > ???? \ > -? diagnostic(bool, DeoptNMethodBarrierALot, false, ??? \ > +? diagnostic(bool, DeoptimizeNMethodBarriersALot, false, ??? \ > ???????????????? "Make nmethod barriers deoptimise a lot.") ???? \ > > ?// Interface macros > ---------------------------------------------------------- > > > * Instead of adding a new file for the test, we could just add a new > section in the existing test. > > * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. Meh, forgot -XX:+UnlockDiagnosticVMOptions in my patch. Updated: ---------------------------------------------------------- diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -35,6 +35,18 @@ * @summary Stress ZGC * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC gc.stress.gcbasher.TestGCBasherWithZ 120000 */ + +/* + * @test TestGCBasherDeoptWithZ + * @key gc stress + * @library / + * @requires vm.gc.Z + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC + * -XX:+UnlockDiagnosticVMOptions -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 + */ + public class TestGCBasherWithZ { public static void main(String[] args) throws IOException { TestGCBasher.main(args); ---------------------------------------------------------- cheers, Per > > Like: > > ---------------------------------------------------------- > diff --git > a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -35,6 +35,18 @@ > ? * @summary Stress ZGC > ? * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > gc.stress.gcbasher.TestGCBasherWithZ 120000 > ? */ > + > +/* > + * @test TestGCBasherDeoptWithZ > + * @key gc stress > + * @library / > + * @requires vm.gc.Z > + * @requires vm.flavor == "server" & !vm.emulatedClient & > !vm.graal.enabled & vm.opt.ClassUnloading != false > + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled > + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace > -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > + *?????????????????????????????? -XX:+DeoptimizeNMethodBarriersALot > -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ 120000 > + */ > + > ?public class TestGCBasherWithZ { > ???? public static void main(String[] args) throws IOException { > ???????? TestGCBasher.main(args); > ---------------------------------------------------------- > > cheers, > Per > > >> >> >> BR, >> ???????? Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose >> the contents to any other person, use it for any purpose, or store or >> copy the information in any medium. Thank you. >> From aph at redhat.com Fri Mar 27 12:36:41 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 27 Mar 2020 12:36:41 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> On 3/26/20 10:42 PM, Stuart Monteith wrote: > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. I understand, but LoadLoad is the semantics you need, and it's more important to say that. The mere existence of verification code shouldn't determine how you express the runtime code. I'll do a thorough review later. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Fri Mar 27 13:38:45 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Mar 2020 14:38:45 +0100 Subject: RFR (S) 8241743: Shenandoah: refactor and inline ShenandoahHeap::heap() Message-ID: <99c634dc-3150-14ec-b375-b7b6fb6e5c64@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241743 ShenandoahHeap::heap() is used on critical performance fastpaths, and should be properly inlined. Instead of going via Universe::heap(), we can just pull it off our own static field. (ZGC does the same). heap_no_check() is not needed anymore, because we don't do any additional checks that make performance worse. Webrev: https://cr.openjdk.java.net/~shade/8241743/webrev.01/ Testing: hotspot_gc_shenandoah; Linux x86_64 {slowdebug, fastdebug, release} builds without PCH -- Thanks, -Aleksey From shade at redhat.com Fri Mar 27 13:38:57 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Mar 2020 14:38:57 +0100 Subject: RFR (S) 8241740: Shenandoah: remove ShenandoahHeapRegion::_heap Message-ID: <7b33cc94-b62a-2115-595c-253c7a768aee@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241740 It is not seem to worth it to drag the _heap field in every region. Almost all the places where it is used is not performance-critical, and ShenandoahHeap::heap() is actually as fast, especially after JDK-8241743. Ditching this field saves 8 bytes per region. I had to move seqnum_last_alloc_mutator to avoid including shenandoahHeap.inline.hpp in a generic header. Webrev: https://cr.openjdk.java.net/~shade/8241740/webrev.02/ Testing: hotspot_gc_shenandoah, eyeballing shenandoahHeapRegion.o objdumps -- Thanks, -Aleksey From shade at redhat.com Fri Mar 27 13:39:07 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 27 Mar 2020 14:39:07 +0100 Subject: RFR (S) 8241748: Shenandoah: inline MarkingContext TAMS methods Message-ID: <49c1c6e4-c016-2c89-3b84-8c9569ec4949@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241748 ShenandoahMarkingContext methods that deal with TAMS are accessed on hot paths. These should be inlined. Webrev: https://cr.openjdk.java.net/~shade/8241748/webrev.01/ Testing: hotspot_gc_shenandoah; Linux x86_64 {slowdebug, fastdebug, release} builds without PCH -- Thanks, -Aleksey From rkennke at redhat.com Fri Mar 27 13:43:57 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Mar 2020 14:43:57 +0100 Subject: RFR (S) 8241743: Shenandoah: refactor and inline ShenandoahHeap::heap() In-Reply-To: <99c634dc-3150-14ec-b375-b7b6fb6e5c64@redhat.com> References: <99c634dc-3150-14ec-b375-b7b6fb6e5c64@redhat.com> Message-ID: <3f790562-49d4-75e5-16a9-3afb97e0447d@redhat.com> > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241743 > > ShenandoahHeap::heap() is used on critical performance fastpaths, and should be properly inlined. > Instead of going via Universe::heap(), we can just pull it off our own static field. (ZGC does the > same). heap_no_check() is not needed anymore, because we don't do any additional checks that make > performance worse. > > Webrev: > https://cr.openjdk.java.net/~shade/8241743/webrev.01/ > > Testing: hotspot_gc_shenandoah; Linux x86_64 {slowdebug, fastdebug, release} builds without PCH That makes sense. Patch looks good! Roman From rkennke at redhat.com Fri Mar 27 13:45:12 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Mar 2020 14:45:12 +0100 Subject: RFR (S) 8241740: Shenandoah: remove ShenandoahHeapRegion::_heap In-Reply-To: <7b33cc94-b62a-2115-595c-253c7a768aee@redhat.com> References: <7b33cc94-b62a-2115-595c-253c7a768aee@redhat.com> Message-ID: <4dec6cca-ea65-7f3e-4869-5f015d07ca79@redhat.com> Looks good, thanks! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241740 > > It is not seem to worth it to drag the _heap field in every region. Almost all the places where it > is used is not performance-critical, and ShenandoahHeap::heap() is actually as fast, especially > after JDK-8241743. Ditching this field saves 8 bytes per region. > > I had to move seqnum_last_alloc_mutator to avoid including shenandoahHeap.inline.hpp in a generic > header. > > Webrev: > https://cr.openjdk.java.net/~shade/8241740/webrev.02/ > > Testing: hotspot_gc_shenandoah, eyeballing shenandoahHeapRegion.o objdumps > From rkennke at redhat.com Fri Mar 27 13:45:55 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Mar 2020 14:45:55 +0100 Subject: RFR (S) 8241748: Shenandoah: inline MarkingContext TAMS methods In-Reply-To: <49c1c6e4-c016-2c89-3b84-8c9569ec4949@redhat.com> References: <49c1c6e4-c016-2c89-3b84-8c9569ec4949@redhat.com> Message-ID: <39245bb4-79ae-6e16-c7a0-152704d319f9@redhat.com> ok! Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241748 > > ShenandoahMarkingContext methods that deal with TAMS are accessed on hot paths. These should be inlined. > > Webrev: > https://cr.openjdk.java.net/~shade/8241748/webrev.01/ > > Testing: hotspot_gc_shenandoah; Linux x86_64 {slowdebug, fastdebug, release} builds without PCH > From zgu at redhat.com Fri Mar 27 14:01:15 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 27 Mar 2020 10:01:15 -0400 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> Message-ID: <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Hi Stuart, Great work! On 3/26/20 6:42 PM, Stuart Monteith wrote: > Hello, > Please review this change to implement nmethod entry barriers on > aarch64, and hence concurrent class unloading with ZGC. Shenandoah will > need to be separately tested and enabled - there are problems with this > on Shenandoah. I identified a problem that failed TestStringDedupStress.java, I have fix for it. Would you mind to share what else failed with Shenandoah? Thanks, -Zhengyu > > It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as > well as Netbeans. > > In terms of interesting features: > With nmethod entry barriers, immediate oops are removed by: > LIR_Assembler::jobject2reg and MacroAssembler::movoop > This is to ensure consistency with the entry barrier, as otherwise with > an immediate we'd otherwise need an ISB. > > I've added "-XX:DeoptNMethodBarrierALot". I found this functionality > useful in testing as deoptimisation is very infrequent. I've written it > as an atomic to avoid it happening too frequently. As it is a new > option, I'm not sure whether any more is needed than this review. A new > test has been added > "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to > test GC with that option enabled. > > BarrierSetAssembler::nmethod_entry_barrier > This method emits the barrier code. In internal review it was suggested > the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not > done this as the BarrierSetNMethod code checks the exact instruction > sequence, and I prefer to be explicit. > > Benchmarking method entry shows an increase of around 6ns with the > nmethod entry barrier. > > > The deoptimisation code was contributed by Andrew Haley. > > The bug: > https://bugs.openjdk.java.net/browse/JDK-8216557 > > The webrev: > http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > > BR, > Stuart > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From rkennke at redhat.com Fri Mar 27 14:13:37 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Mar 2020 15:13:37 +0100 Subject: RFR (S) 8241692: Shenandoah: remove ShenandoahHeapRegion::_reserved In-Reply-To: <4fefe5f0-473b-8198-5192-e1de4e597479@redhat.com> References: <4fefe5f0-473b-8198-5192-e1de4e597479@redhat.com> Message-ID: Yes, looks good! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241692 > > Follow up from JDK-8241668: _reserved field is not actually needed, because we can just use bottom() > and end() available. Saves 16 bytes per region. > > Webrev: > https://cr.openjdk.java.net/~shade/8241692/webrev.01/ > > Testing: hotspot_gc_shenandoah > From per.liden at oracle.com Fri Mar 27 14:30:15 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 15:30:15 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) Message-ID: Please review the patch for JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production). This patch changes the UseZGC option, some of the ZGC-specific options, as well as some of the ZGC-specific JFR events from experimental to product. It also adjusts tests using ZGC to not supply -XX:+UnlockExperimentalVMOptions. Note that this patch builds on JDK-8241361, which as of this writing, has not yet been pushed. JEP: https://openjdk.java.net/jeps/377 Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 Testing: Passed tier 1-7 on all platforms. /Per From stefan.karlsson at oracle.com Fri Mar 27 14:33:03 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 27 Mar 2020 15:33:03 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: References: Message-ID: Looks great! StefanK On 2020-03-27 15:30, Per Liden wrote: > Please review the patch for JEP 377: ZGC: A Scalable Low-Latency > Garbage Collector (Production). > > This patch changes the UseZGC option, some of the ZGC-specific > options, as well as some of the ZGC-specific JFR events from > experimental to product. It also adjusts tests using ZGC to not supply > -XX:+UnlockExperimentalVMOptions. > > Note that this patch builds on JDK-8241361, which as of this writing, > has not yet been pushed. > > JEP: https://openjdk.java.net/jeps/377 > Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 > Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 > > Testing: Passed tier 1-7 on all platforms. > > /Per From erik.osterlund at oracle.com Fri Mar 27 14:41:27 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 27 Mar 2020 15:41:27 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: References: Message-ID: <68ee2186-f1c0-8e8b-a5b1-b44e4b675f6d@oracle.com> Hi Per, Looks amazing. Thanks, /Erik On 2020-03-27 15:30, Per Liden wrote: > Please review the patch for JEP 377: ZGC: A Scalable Low-Latency > Garbage Collector (Production). > > This patch changes the UseZGC option, some of the ZGC-specific > options, as well as some of the ZGC-specific JFR events from > experimental to product. It also adjusts tests using ZGC to not supply > -XX:+UnlockExperimentalVMOptions. > > Note that this patch builds on JDK-8241361, which as of this writing, > has not yet been pushed. > > JEP: https://openjdk.java.net/jeps/377 > Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 > Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 > > Testing: Passed tier 1-7 on all platforms. > > /Per From per.liden at oracle.com Fri Mar 27 14:59:46 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 15:59:46 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: References: Message-ID: Thanks for reviewing! /Per On 3/27/20 3:33 PM, Stefan Karlsson wrote: > Looks great! > > StefanK > > On 2020-03-27 15:30, Per Liden wrote: >> Please review the patch for JEP 377: ZGC: A Scalable Low-Latency >> Garbage Collector (Production). >> >> This patch changes the UseZGC option, some of the ZGC-specific >> options, as well as some of the ZGC-specific JFR events from >> experimental to product. It also adjusts tests using ZGC to not supply >> -XX:+UnlockExperimentalVMOptions. >> >> Note that this patch builds on JDK-8241361, which as of this writing, >> has not yet been pushed. >> >> JEP: https://openjdk.java.net/jeps/377 >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 >> Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 >> >> Testing: Passed tier 1-7 on all platforms. >> >> /Per > From per.liden at oracle.com Fri Mar 27 14:59:57 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 27 Mar 2020 15:59:57 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: <68ee2186-f1c0-8e8b-a5b1-b44e4b675f6d@oracle.com> References: <68ee2186-f1c0-8e8b-a5b1-b44e4b675f6d@oracle.com> Message-ID: Thanks for reviewing! /Per On 3/27/20 3:41 PM, Erik ?sterlund wrote: > Hi Per, > > Looks amazing. > > Thanks, > /Erik > > On 2020-03-27 15:30, Per Liden wrote: >> Please review the patch for JEP 377: ZGC: A Scalable Low-Latency >> Garbage Collector (Production). >> >> This patch changes the UseZGC option, some of the ZGC-specific >> options, as well as some of the ZGC-specific JFR events from >> experimental to product. It also adjusts tests using ZGC to not supply >> -XX:+UnlockExperimentalVMOptions. >> >> Note that this patch builds on JDK-8241361, which as of this writing, >> has not yet been pushed. >> >> JEP: https://openjdk.java.net/jeps/377 >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 >> Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 >> >> Testing: Passed tier 1-7 on all platforms. >> >> /Per > From stumon01 at arm.com Fri Mar 27 15:28:23 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:28:23 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Message-ID: Hello Zhengyu, That is the same test I had trouble with. One of the stack traces I had is: V [libjvm.so+0x4dd538] CompressedKlassPointers::decode_not_null(unsigned int)+0x70 V [libjvm.so+0xb87130] InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] j TestStringDedupStress.main([Ljava/lang/String;)V+162 v ~StubRoutines::call_stub V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 V [libjvm.so+0x12531d4] invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 V [libjvm.so+0x125380c] Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 V [libjvm.so+0xd05280] JVM_InvokeMethod+0x210 j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base at 15-internal j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base at 15-internal j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base at 15-internal j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 j java.lang.Thread.run()V+11 java.base at 15-internal v ~StubRoutines::call_stub V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 V [libjvm.so+0xb95784] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac V [libjvm.so+0xb95974] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac V [libjvm.so+0xcefce8] thread_entry(JavaThread*, Thread*)+0x98 V [libjvm.so+0x1507cc8] JavaThread::thread_main_inner()+0x258 V [libjvm.so+0x150fdac] JavaThread::run()+0x27c V [libjvm.so+0x150d4a4] Thread::call_run()+0x10c V [libjvm.so+0x115ff70] thread_native_entry(Thread*)+0x120 C [libpthread.so.0+0x8880] start_thread+0x1a0 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] j TestStringDedupStress.main([Ljava/lang/String;)V+162 v ~StubRoutines::call_stub j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base at 15-internal j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base at 15-internal j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base at 15-internal j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 j java.lang.Thread.run()V+11 java.base at 15-internal v ~StubRoutines::call_stub There are variations on that theme, but that was one of the more common ones. Thanks, Stuart On 27/03/2020 14:01, Zhengyu Gu wrote: > Hi Stuart, > > Great work! > > On 3/26/20 6:42 PM, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. > > I identified a problem that failed TestStringDedupStress.java, I have fix for it. > > Would you mind to share what else failed with Shenandoah? > > Thanks, > > -Zhengyu > > >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From zgu at redhat.com Fri Mar 27 15:30:31 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 27 Mar 2020 11:30:31 -0400 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> Message-ID: <55a800b2-1564-1162-7177-485e674dc618@redhat.com> Hi Stuart, Yes, this is the same problem I saw. I filed JDK-8241765, will RFR once it passes all tests. Thanks, -Zhengyu On 3/27/20 11:28 AM, Stuart Monteith wrote: > Hello Zhengyu, > That is the same test I had trouble with. > > One of the stack traces I had is: > > V [libjvm.so+0x4dd538] CompressedKlassPointers::decode_not_null(unsigned int)+0x70 > V [libjvm.so+0xb87130] InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 > j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal > J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; > java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] > j TestStringDedupStress.main([Ljava/lang/String;)V+162 > v ~StubRoutines::call_stub > V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 > V [libjvm.so+0x12531d4] invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, > objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 > V [libjvm.so+0x125380c] Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 > V [libjvm.so+0xd05280] JVM_InvokeMethod+0x210 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base at 15-internal > j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base at 15-internal > j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base at 15-internal > j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal > j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 > j java.lang.Thread.run()V+11 java.base at 15-internal > v ~StubRoutines::call_stub > V [libjvm.so+0xb95328] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 > V [libjvm.so+0xb95784] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac > V [libjvm.so+0xb95974] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac > V [libjvm.so+0xcefce8] thread_entry(JavaThread*, Thread*)+0x98 > V [libjvm.so+0x1507cc8] JavaThread::thread_main_inner()+0x258 > V [libjvm.so+0x150fdac] JavaThread::run()+0x27c > V [libjvm.so+0x150d4a4] Thread::call_run()+0x10c > V [libjvm.so+0x115ff70] thread_native_entry(Thread*)+0x120 > C [libpthread.so.0+0x8880] start_thread+0x1a0 > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal > J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; > java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] > j TestStringDedupStress.main([Ljava/lang/String;)V+162 > v ~StubRoutines::call_stub > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base at 15-internal > j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base at 15-internal > j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base at 15-internal > j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal > j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 > j java.lang.Thread.run()V+11 java.base at 15-internal > v ~StubRoutines::call_stub > > There are variations on that theme, but that was one of the more common ones. > > Thanks, > Stuart > > On 27/03/2020 14:01, Zhengyu Gu wrote: >> Hi Stuart, >> >> Great work! >> >> On 3/26/20 6:42 PM, Stuart Monteith wrote: >>> Hello, >>> Please review this change to implement nmethod entry barriers on >>> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >>> need to be separately tested and enabled - there are problems with this >>> on Shenandoah. >> >> I identified a problem that failed TestStringDedupStress.java, I have fix for it. >> >> Would you mind to share what else failed with Shenandoah? >> >> Thanks, >> >> -Zhengyu >> >> >>> >>> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >>> well as Netbeans. >>> >>> In terms of interesting features: >>> With nmethod entry barriers, immediate oops are removed by: >>> LIR_Assembler::jobject2reg and MacroAssembler::movoop >>> This is to ensure consistency with the entry barrier, as otherwise with >>> an immediate we'd otherwise need an ISB. >>> >>> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >>> useful in testing as deoptimisation is very infrequent. I've written it >>> as an atomic to avoid it happening too frequently. As it is a new >>> option, I'm not sure whether any more is needed than this review. A new >>> test has been added >>> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >>> test GC with that option enabled. >>> >>> BarrierSetAssembler::nmethod_entry_barrier >>> This method emits the barrier code. In internal review it was suggested >>> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >>> done this as the BarrierSetNMethod code checks the exact instruction >>> sequence, and I prefer to be explicit. >>> >>> Benchmarking method entry shows an increase of around 6ns with the >>> nmethod entry barrier. >>> >>> >>> The deoptimisation code was contributed by Andrew Haley. >>> >>> The bug: >>> https://bugs.openjdk.java.net/browse/JDK-8216557 >>> >>> The webrev: >>> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >>> >>> >>> BR, >>> Stuart >>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >>> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >>> person, use it for any purpose, or store or copy the information in any medium. Thank you. >>> >> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From stumon01 at arm.com Fri Mar 27 15:32:53 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:32:53 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <55a800b2-1564-1162-7177-485e674dc618@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8e4978ac-e563-56df-72b9-81f37f8adc39@redhat.com> <55a800b2-1564-1162-7177-485e674dc618@redhat.com> Message-ID: <40210d3e-15f5-02b9-9197-d35549d40cf7@arm.com> Thanks, that's great. It's great we have two GCs able to exercise the new barrier. On 27/03/2020 15:30, Zhengyu Gu wrote: > Hi Stuart, > > Yes, this is the same problem I saw. I filed JDK-8241765, will RFR once it passes all tests. > > Thanks, > > -Zhengyu > > On 3/27/20 11:28 AM, Stuart Monteith wrote: >> Hello Zhengyu, >> That is the same test I had trouble with. >> >> One of the stack traces I had is: >> >> V [libjvm.so+0x4dd538]? CompressedKlassPointers::decode_not_null(unsigned int)+0x70 >> V [libjvm.so+0xb87130]? InterpreterRuntime::throw_ClassCastException(JavaThread*, oopDesc*)+0x148 >> j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal >> J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; >> java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] >> j TestStringDedupStress.main([Ljava/lang/String;)V+162 >> v ~StubRoutines::call_stub >> V [libjvm.so+0xb95328]? JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 >> V [libjvm.so+0x12531d4]? invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, >> objArrayHandle, bool, Thread*) [clone .isra.138]+0xd74 >> V [libjvm.so+0x125380c]? Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x1a4 >> V [libjvm.so+0xd05280]? JVM_InvokeMethod+0x210 >> j >> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 >> >> java.base at 15-internal >> j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 >> java.base at 15-internal >> j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 >> java.base at 15-internal >> j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal >> j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 >> j java.lang.Thread.run()V+11 java.base at 15-internal >> v ~StubRoutines::call_stub >> V [libjvm.so+0xb95328]? JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x6f8 >> V [libjvm.so+0xb95784]? JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ac >> V [libjvm.so+0xb95974]? JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0xac >> V [libjvm.so+0xcefce8]? thread_entry(JavaThread*, Thread*)+0x98 >> V [libjvm.so+0x1507cc8]? JavaThread::thread_main_inner()+0x258 >> V [libjvm.so+0x150fdac]? JavaThread::run()+0x27c >> V [libjvm.so+0x150d4a4]? Thread::call_run()+0x10c >> V [libjvm.so+0x115ff70]? thread_native_entry(Thread*)+0x120 >> C [libpthread.so.0+0x8880]? start_thread+0x1a0 >> >> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >> j java.lang.invoke.LambdaForm$MH.invoke(Ljava/lang/Object;I)Ljava/lang/Object;+1 java.base at 15-internal >> J 426 c2 java.lang.invoke.Invokers$Holder.linkToTargetMethod(ILjava/lang/Object;)Ljava/lang/Object; >> java.base at 15-internal (9 bytes) @ 0x0000ffff978ecc24 [0x0000ffff978ecbc0+0x0000000000000064] >> j TestStringDedupStress.main([Ljava/lang/String;)V+162 >> v ~StubRoutines::call_stub >> j >> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 >> >> java.base at 15-internal >> j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 >> java.base at 15-internal >> j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 >> java.base at 15-internal >> j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base at 15-internal >> j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 >> j java.lang.Thread.run()V+11 java.base at 15-internal >> v ~StubRoutines::call_stub >> >> There are variations on that theme, but that was one of the more common ones. >> >> Thanks, >> Stuart >> >> On 27/03/2020 14:01, Zhengyu Gu wrote: >>> Hi Stuart, >>> >>> Great work! >>> >>> On 3/26/20 6:42 PM, Stuart Monteith wrote: >>>> Hello, >>>> Please review this change to implement nmethod entry barriers on >>>> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >>>> need to be separately tested and enabled - there are problems with this >>>> on Shenandoah. >>> >>> I identified a problem that failed TestStringDedupStress.java, I have fix for it. >>> >>> Would you mind to share what else failed with Shenandoah? >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>>> >>>> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >>>> well as Netbeans. >>>> >>>> In terms of interesting features: >>>> With nmethod entry barriers, immediate oops are removed by: >>>> LIR_Assembler::jobject2reg and MacroAssembler::movoop >>>> This is to ensure consistency with the entry barrier, as otherwise with >>>> an immediate we'd otherwise need an ISB. >>>> >>>> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >>>> useful in testing as deoptimisation is very infrequent. I've written it >>>> as an atomic to avoid it happening too frequently. As it is a new >>>> option, I'm not sure whether any more is needed than this review. A new >>>> test has been added >>>> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >>>> test GC with that option enabled. >>>> >>>> BarrierSetAssembler::nmethod_entry_barrier >>>> This method emits the barrier code. In internal review it was suggested >>>> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >>>> done this as the BarrierSetNMethod code checks the exact instruction >>>> sequence, and I prefer to be explicit. >>>> >>>> Benchmarking method entry shows an increase of around 6ns with the >>>> nmethod entry barrier. >>>> >>>> >>>> The deoptimisation code was contributed by Andrew Haley. >>>> >>>> The bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8216557 >>>> >>>> The webrev: >>>> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >>>> >>>> >>>> BR, >>>> Stuart >>>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >>>> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >>>> person, use it for any purpose, or store or copy the information in any medium. Thank you. >>>> >>> >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 15:35:30 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 15:35:30 +0000 Subject: [aarch64-port-dev ] RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <105c4a4a-59c9-8095-6d45-642595f65539@redhat.com> Message-ID: <9bd573db-1983-1dd3-dd11-f7803de3f851@arm.com> Thanks Andrew, I'll change that round. The code verifying the barrier would catch any change there anyway. On 27/03/2020 12:36, Andrew Haley wrote: > On 3/26/20 10:42 PM, Stuart Monteith wrote: >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. > > I understand, but LoadLoad is the semantics you need, and it's more important > to say that. The mere existence of verification code shouldn't determine > how you express the runtime code. > > I'll do a thorough review later. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From zgu at redhat.com Fri Mar 27 18:55:24 2020 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 27 Mar 2020 14:55:24 -0400 Subject: [15] RFR 8241765: Shenandoah: AARCH64 need to save/restore call clobbered registers before calling keepalive barrier Message-ID: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> This bug was discovered while testing aarch64 nmethod entry barrier patch posted by Stuart Monteith [1]. We had the same issues on x86 platforms, and fixed by JDK-8233500 and JDK-8237776, but never fixed aarch64. Bug: https://bugs.openjdk.java.net/browse/JDK-8241765 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241765/webrev.00/ Test: hotspot_gc_shenandoah Thanks, -Zhengyu [1] https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028998.html From rkennke at redhat.com Fri Mar 27 19:05:23 2020 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 27 Mar 2020 20:05:23 +0100 Subject: [aarch64-port-dev ] [15] RFR 8241765: Shenandoah: AARCH64 need to save/restore call clobbered registers before calling keepalive barrier In-Reply-To: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> References: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> Message-ID: <460ce94f-f4eb-95ca-5be1-2a19f4853bc1@redhat.com> Ok. Thank you! Roman > This bug was discovered while testing aarch64 nmethod entry barrier > patch posted by Stuart Monteith [1]. > > We had the same issues on x86 platforms, and fixed by JDK-8233500 and > JDK-8237776, but never fixed aarch64. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241765 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241765/webrev.00/ > > Test: > ? hotspot_gc_shenandoah > > Thanks, > > -Zhengyu > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028998.html > > From claes.redestad at oracle.com Fri Mar 27 19:39:05 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 27 Mar 2020 20:39:05 +0100 Subject: RFR: 8241771: Remove dead code in SparsePRT Message-ID: Hi, some profiles had me take a look SparsePRT, where I found some dead code I think we can remove. Webrev: http://cr.openjdk.java.net/~redestad/8241771/open.00/ Testing: tier1+2 (Will update copyright headers before push) Thanks! /Claes From stumon01 at arm.com Fri Mar 27 22:53:46 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 22:53:46 +0000 Subject: [aarch64-port-dev ] [15] RFR 8241765: Shenandoah: AARCH64 need to save/restore call clobbered registers before calling keepalive barrier In-Reply-To: <460ce94f-f4eb-95ca-5be1-2a19f4853bc1@redhat.com> References: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> <460ce94f-f4eb-95ca-5be1-2a19f4853bc1@redhat.com> Message-ID: <9b30fa8e-5906-b811-3e24-5bc77d1c3638@arm.com> That looks good to me, thanks. On 27/03/2020 19:05, Roman Kennke wrote: > Ok. Thank you! > > Roman > > >> This bug was discovered while testing aarch64 nmethod entry barrier >> patch posted by Stuart Monteith [1]. >> >> We had the same issues on x86 platforms, and fixed by JDK-8233500 and >> JDK-8237776, but never fixed aarch64. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8241765 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8241765/webrev.00/ >> >> Test: >> hotspot_gc_shenandoah >> >> Thanks, >> >> -Zhengyu >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028998.html >> >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 23:12:14 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 23:12:14 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <1dc6cf14-267a-2741-2011-3c3a1bb74a38@oracle.com> Message-ID: Thanks Per, That all makes sense - I've made those changes, they'll appear in the next patch set. On 27/03/2020 11:36, Per Liden wrote: > Hi Stuart, > > Awesome, thanks a lot for doing this! > > On 3/26/20 11:42 PM, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ > > I'll leave the aarch64-specific part for others to review. I just have two minor comments on the rest. > > * May I suggest that we rename DeoptNMethodBarrierALot to DeoptimizeNMethodBarriersALot, to better match > -XX:DeoptimizeALot and friends? > > * The "counter" used should probably be an unsigned type, to avoid any overflow UB. That variable could also move into > the scope where it's used. > > Like: > > ---------------------------------------------------------- > diff --git a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > --- a/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > +++ b/src/hotspot/share/gc/shared/barrierSetNMethod.cpp > @@ -50,7 +50,6 @@ > int BarrierSetNMethod::nmethod_stub_entry_barrier(address* return_address_ptr) { > address return_address = *return_address_ptr; > CodeBlob* cb = CodeCache::find_blob(return_address); > - static volatile int counter=0; > > assert(cb != NULL, "invariant"); > > @@ -67,8 +66,9 @@ > > // Diagnostic option to force deoptimization 1 in 3 times. It is otherwise > // a very rare event. > - if (DeoptNMethodBarrierALot) { > - if (Atomic::add(&counter, 1) % 3 == 0) { > + if (DeoptimizeNMethodBarriersALot) { > + static volatile uint32_t counter = 0; > + if (Atomic::add(&counter, 1u) % 3 == 0) { > may_enter = false; > } > } > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -2489,7 +2489,7 @@ > product(bool, UseEmptySlotsInSupers, true, \ > "Allow allocating fields in empty slots of super-classes") \ > > \ > - diagnostic(bool, DeoptNMethodBarrierALot, false, \ > + diagnostic(bool, DeoptimizeNMethodBarriersALot, false, \ > "Make nmethod barriers deoptimise a lot.") \ > > // Interface macros > ---------------------------------------------------------- > > > * Instead of adding a new file for the test, we could just add a new section in the existing test. > > * The test also needs to supply -XX:+UnlockDiagnosticVMOptions. > > Like: > > ---------------------------------------------------------- > diff --git a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > --- a/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > +++ b/test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithZ.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2016, 2020, Oracle and/or its affiliates. All rights reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -35,6 +35,18 @@ > * @summary Stress ZGC > * @run main/othervm/timeout=200 -Xlog:gc*=info -Xmx384m -server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > gc.stress.gcbasher.TestGCBasherWithZ 120000 > */ > + > +/* > + * @test TestGCBasherDeoptWithZ > + * @key gc stress > + * @library / > + * @requires vm.gc.Z > + * @requires vm.flavor == "server" & !vm.emulatedClient & !vm.graal.enabled & vm.opt.ClassUnloading != false > + * @summary Stress ZGC with nmethod barrier forced deoptimization enabled > + * @run main/othervm/timeout=200 -Xlog:gc*,nmethod+barrier=trace -Xmx384m -XX:+UnlockExperimentalVMOptions -XX:+UseZGC > + * -XX:+DeoptimizeNMethodBarriersALot -XX:-Inline gc.stress.gcbasher.TestGCBasherWithZ > 120000 > + */ > + > public class TestGCBasherWithZ { > public static void main(String[] args) throws IOException { > TestGCBasher.main(args); > ---------------------------------------------------------- > > cheers, > Per > > >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From stumon01 at arm.com Fri Mar 27 23:42:52 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Fri, 27 Mar 2020 23:42:52 +0000 Subject: RFR: 8216557 Aarch64: Add support for Concurrent Class Unloading In-Reply-To: <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> References: <520f8085-eaa0-46bc-9eb9-c1244fca2531@arm.com> <8f317840-a2b2-3ccb-fbb2-a38b2ebcbf4b@oracle.com> Message-ID: <64351542-2e88-b918-025d-74456d507d1a@arm.com> Hi Erik, I'm scratching my head a little as to why I ventured into platform independent code. Anyhow, I've moved the code back to where it belongs, and that'll be in my next webrev. Thanks, Stuart On 27/03/2020 09:47, Erik ?sterlund wrote: > Hi Stuart, > > Thanks for sorting this out on AArch64. It is nice to see thatyou can implement these > barriers on platforms that do not have instruction cache coherency. > > One small change request: > It looks like in C1 you inject the entry barrier right after build_frame is done: > > 629 build_frame(); > 630 { > 631 // Insert nmethod entry barrier into frame. > 632 BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); > 633 bs->nmethod_entry_barrier(_masm); > 634 } > > Unfortunately, this is in the platform independent part of the LIR assembler. In the x86 version > we inject it at the very end of build_frame() instead, which is a platform-specific function. > The platform-specific function is in the C1 macro assembler file for that platform. > > We intentionally put it in the platform-specific path as it is a platform-specific feature. > Now on x86, the barrier code will be emitted once in build_frame() and once after returning > from build_frame, resulting in two nmethod entry barriers, and only the first one will get > patched, causing the second one to mostly take slow paths, which isn't necessarily wrong, > but will cause regressions. > > I would propose you just move those lines into the very end of the AArch64-specific part of > build_frame(). > > I don't need to see another webrev for that trivial code motion. This looks good to me. > Agan, thanks a lot for fixing this! It will allow me to go forward with concurrent stack > scanning on AArch64 as well. > > Thanks, > /Erik > > > On 2020-03-26 23:42, Stuart Monteith wrote: >> Hello, >> Please review this change to implement nmethod entry barriers on >> aarch64, and hence concurrent class unloading with ZGC. Shenandoah will >> need to be separately tested and enabled - there are problems with this >> on Shenandoah. >> >> It has been tested with JTreg, runs with SPECjbb, gcbench, and Lucene as >> well as Netbeans. >> >> In terms of interesting features: >> With nmethod entry barriers, immediate oops are removed by: >> LIR_Assembler::jobject2reg and MacroAssembler::movoop >> This is to ensure consistency with the entry barrier, as otherwise with >> an immediate we'd otherwise need an ISB. >> >> I've added "-XX:DeoptNMethodBarrierALot". I found this functionality >> useful in testing as deoptimisation is very infrequent. I've written it >> as an atomic to avoid it happening too frequently. As it is a new >> option, I'm not sure whether any more is needed than this review. A new >> test has been added >> "test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherDeoptWithZ.java" to >> test GC with that option enabled. >> >> BarrierSetAssembler::nmethod_entry_barrier >> This method emits the barrier code. In internal review it was suggested >> the "dmb( ISHLD )" should be replaced by "membar(LoadLoad)". I've not >> done this as the BarrierSetNMethod code checks the exact instruction >> sequence, and I prefer to be explicit. >> >> Benchmarking method entry shows an increase of around 6ns with the >> nmethod entry barrier. >> >> >> The deoptimisation code was contributed by Andrew Haley. >> >> The bug: >> https://bugs.openjdk.java.net/browse/JDK-8216557 >> >> The webrev: >> http://cr.openjdk.java.net/~smonteith/8216557/webrev.0/ >> >> >> BR, >> Stuart >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you >> are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other >> person, use it for any purpose, or store or copy the information in any medium. Thank you. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From erik.gahlin at oracle.com Sat Mar 28 10:54:10 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Sat, 28 Mar 2020 11:54:10 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: References: Message-ID: <1A1B5498-3090-4287-A4CC-8EC37FA33BF8@oracle.com> Hi Per, I couldn?t see any unit tests for the JFR events. All supported events should have that. I also wonder about the ZStatisticsCounter event. JFR events should provide metadata (label, name, units etc) describing values. Either refactor it into well-formed events, or keep the event experimental. Thanks Erik > On 27 Mar 2020, at 15:30, Per Liden wrote: > > Please review the patch for JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production). > > This patch changes the UseZGC option, some of the ZGC-specific options, as well as some of the ZGC-specific JFR events from experimental to product. It also adjusts tests using ZGC to not supply -XX:+UnlockExperimentalVMOptions. > > Note that this patch builds on JDK-8241361, which as of this writing, has not yet been pushed. > > JEP: https://openjdk.java.net/jeps/377 > Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 > Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 > > Testing: Passed tier 1-7 on all platforms. > > /Per From stefan.johansson at oracle.com Sat Mar 28 11:41:49 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Sat, 28 Mar 2020 12:41:49 +0100 Subject: RFR: 8241771: Remove dead code in SparsePRT In-Reply-To: References: Message-ID: <11fcbfe3-fd02-a49d-8eb8-9b413a332819@oracle.com> Hi Claes, On 2020-03-27 20:39, Claes Redestad wrote: > Hi, > > some profiles had me take a look SparsePRT, where I found some dead code > I think we can remove. > > Webrev: http://cr.openjdk.java.net/~redestad/8241771/open.00/ Looks good, StefanJ > > Testing: tier1+2 > > (Will update copyright headers before push) > > Thanks! > > /Claes From aph at redhat.com Sat Mar 28 12:23:50 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 28 Mar 2020 12:23:50 +0000 Subject: [15] RFR 8241765: Shenandoah: AARCH64 need to save/restore call clobbered registers before calling keepalive barrier In-Reply-To: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> References: <85e6e3c1-64fd-4873-5a5f-492171b8f9eb@redhat.com> Message-ID: On 3/27/20 6:55 PM, Zhengyu Gu wrote: > This bug was discovered while testing aarch64 nmethod entry barrier > patch posted by Stuart Monteith [1]. > > We had the same issues on x86 platforms, and fixed by JDK-8233500 and > JDK-8237776, but never fixed aarch64. The patch looks OK. It's a bit odd that it wasn't applied to AArch64 at the time, but never mind. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From thomas.schatzl at oracle.com Sat Mar 28 12:50:52 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sat, 28 Mar 2020 13:50:52 +0100 Subject: RFR: 8241771: Remove dead code in SparsePRT In-Reply-To: <11fcbfe3-fd02-a49d-8eb8-9b413a332819@oracle.com> References: <11fcbfe3-fd02-a49d-8eb8-9b413a332819@oracle.com> Message-ID: Hi, On 28.03.20 12:41, Stefan Johansson wrote: > Hi Claes, > > On 2020-03-27 20:39, Claes Redestad wrote: >> Hi, >> >> some profiles had me take a look SparsePRT, where I found some dead code >> I think we can remove. >> >> Webrev: http://cr.openjdk.java.net/~redestad/8241771/open.00/ > Looks good, > StefanJ looks good. Thomas From per.liden at oracle.com Sat Mar 28 13:02:30 2020 From: per.liden at oracle.com (Per Liden) Date: Sat, 28 Mar 2020 14:02:30 +0100 Subject: RFR: 8240745: Implementation: JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) In-Reply-To: <1A1B5498-3090-4287-A4CC-8EC37FA33BF8@oracle.com> References: <1A1B5498-3090-4287-A4CC-8EC37FA33BF8@oracle.com> Message-ID: <1dbc41dc-19b1-8a2d-e0e9-172c20928b4c@oracle.com> Hi Erik, On 3/28/20 11:54 AM, Erik Gahlin wrote: > Hi Per, > > I couldn?t see any unit tests for the JFR events. All supported events should have that. Check, I'll look into adding that. > > I also wonder about the ZStatisticsCounter event. JFR events should provide metadata (label, name, units etc) describing values. Either refactor it into well-formed events, or keep the event experimental. Yes, we intend to keep ZStatisticsCounter and ZStatisticsSampler experimental at this time. cheers, Per > > Thanks > Erik > >> On 27 Mar 2020, at 15:30, Per Liden wrote: >> >> Please review the patch for JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production). >> >> This patch changes the UseZGC option, some of the ZGC-specific options, as well as some of the ZGC-specific JFR events from experimental to product. It also adjusts tests using ZGC to not supply -XX:+UnlockExperimentalVMOptions. >> >> Note that this patch builds on JDK-8241361, which as of this writing, has not yet been pushed. >> >> JEP: https://openjdk.java.net/jeps/377 >> Bug: https://bugs.openjdk.java.net/browse/JDK-8240745 >> Webrev: http://cr.openjdk.java.net/~pliden/8240745/webrev.0 >> >> Testing: Passed tier 1-7 on all platforms. >> >> /Per > From claes.redestad at oracle.com Sat Mar 28 13:30:33 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Sat, 28 Mar 2020 14:30:33 +0100 Subject: RFR: 8241771: Remove dead code in SparsePRT In-Reply-To: References: <11fcbfe3-fd02-a49d-8eb8-9b413a332819@oracle.com> Message-ID: <13f414bf-fbea-e4d5-2b26-4a1b5d832066@oracle.com> Stefan, Thomas, thank you for reviewing! /Claes On 2020-03-28 13:50, Thomas Schatzl wrote: > Hi, > > On 28.03.20 12:41, Stefan Johansson wrote: >> Hi Claes, >> >> On 2020-03-27 20:39, Claes Redestad wrote: >>> Hi, >>> >>> some profiles had me take a look SparsePRT, where I found some dead code >>> I think we can remove. >>> >>> Webrev: http://cr.openjdk.java.net/~redestad/8241771/open.00/ >> Looks good, >> StefanJ > > looks good. > > Thomas From igor.ignatyev at oracle.com Sun Mar 29 16:07:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Sun, 29 Mar 2020 09:07:35 -0700 Subject: RFR(S) : 8203238: [TESTBUG] rewrite MemOptions shell test in Java In-Reply-To: <6B89C20B-36D8-4743-979B-56DDF8ADCE64@oracle.com> References: <6B89C20B-36D8-4743-979B-56DDF8ADCE64@oracle.com> Message-ID: Ping? ? Igor > On Mar 25, 2020, at 10:42 AM, Igor Ignatyev wrote: > > ?http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 >> 330 lines changed: 91 ins; 236 del; 3 mod; > > Hi all, > > could you please review this small patch which rewrites MemOptions shell test? > > while porting the test, I noticed that available memory checks aren't required, and the test successfully passes even w/o them, so the java version of the test doesn't check available memory and only @requires 64 bits vm. given the test doesn't require lots of time/resources to execute, I've also removed it from exclusiveAccess. MemStat class was made static inner class of MemOptionsTest for the sake of readability and brevity. > > webrev: http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 > testing: the changed tests multiple tests on {linux, windows, mac} w/ {SerialGC,ZGC,G1GC,ParallelGC} > JBS: https://bugs.openjdk.java.net/browse/JDK-8203238 > > NB the shell version of the test had a bug which prevent its execution. an incorrect operator (:=) was used at L#23,23, which led to bogus 'java' variable at L#44 and non zero exit code at L#48, so the test passes w/ 'Skipping the test; a 64-bit VM is required.' message on all platforms. so this patch effectively resurrects the test. > > Thanks, > -- Igor > From stefan.karlsson at oracle.com Mon Mar 30 11:12:03 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 30 Mar 2020 13:12:03 +0200 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: References: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> <31bcbdf5-6b20-502d-5f91-8bd18962985d@redhat.com> Message-ID: Hi again, Did you find the time to take a look at this? I'd like to propose that we go with the current solution to disable the incorrect reporting of the events for now, until you find the time to look at this. The effect of this will be that you won't get this sub-phase reported, but at the same time it do remove the bug that a pause was reported. Thanks, StefanK On 2020-03-26 10:55, Stefan Karlsson wrote: > On 2020-03-26 10:54, Roman Kennke wrote: >> Hey Stefan, >> >> Sorry, this went under my radar. Give us half a day or so, yes? > Sure. > > StefanK > >> >> Thanks, >> Roman >> >>> Shenandoah devs, any comments w.r.t. to the Shenandoah section below? >>> >>> Thanks, >>> StefanK >>> >>> On 2020-03-19 10:44, Stefan Karlsson wrote: >>>> Hi all, >>>> >>>> Please review this patch to rewrite the GCTimer, and associated >>>> classes, to not allow nested phases of different types (pause or >>>> concurrent). >>>> >>>> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >>>> https://bugs.openjdk.java.net/browse/JDK-8241160 >>>> >>>> A bug was found when I was looking at JFR events from ZGC. A >>>> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The >>>> only valid parent is a GCPhasePause event. The reason why this >>>> happened was that the we use a GCTraceTime class inside the class >>>> unloading code. Previously, we only used GCTraceTimes inside pauses, >>>> but ever since class unloading was moved out to a concurrent phase, >>>> this isn't true anymore. GCTraceTime used >>>> GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and >>>> therefore always reported pauses and pause sub-phases. >>>> >>>> With this patch, I suggest that we become stricter in our usages of >>>> the GCTimer. The effects of the patch are: >>>> >>>> 1) When a top-level pause (or concurrent) phase is created, the code >>>> must be explicit about what type of phase is created. The code will >>>> now assert if this is abused. Most places were already explicit, but I >>>> had to change two places: >>>> >>>> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have >>>> access to register_gc_pause_start. I made that function public, >>>> instead of protected, so that we didn't have to deal with that >>>> problem. >>>> >>>> b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in >>>> VM_G1Concurrent). This is the only place that uses GCTraceTime to >>>> start a pause. All other places use GCTraceTime to create sub-phases. >>>> I could have copy-n-pasted the entire >>>> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and >>>> create a version that calls register_gc_pause_start instead of >>>> register_gc_phase_start. Instead of doing that I opted for creating a >>>> system where the code code register a set of callbacks to be called >>>> when the start and end time is registered. This is used in the backend >>>> of GCTraceTime, but then also used by G1 to allow us to not have to >>>> copy-n-paste a lot of the code. >>>> >>>> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic >>>> to the default callbacks (unfied logging and GCTimer) but couldn't >>>> find a nice way to express that, because of the way we macro-expand >>>> the UL tags. Maybe something we can consider for a future >>>> investigation. >>>> >>>> 2) sub-phases now inherit the type from the parent phase, and there's >>>> no possibility to incorrectly nest phases anymore. This also removed >>>> the need for ConcurrentGCTimer::_is_concurrent_phase_active. >>>> >>>> 3) This allows (and encourages concurrent sub-phases). When the JFR >>>> events were ported to HotSpot, only pauses got sub-phases, because >>>> there wasn't a big need for concurrent sub-phases. In this patch I >>>> added level of sub-phases to JFR. Maybe it would be better to add more >>>> right away? (I'm not a fan of having the explicit sub-phase level >>>> events, instead of a counter in *the* phase event, but the JMC team at >>>> that time needed it to be logged as separate events. Maybe something >>>> that could be reconsidered some time) >>>> >>>> 4) The different consumers of the timestamps are separated into their >>>> own classes. >>>> >>>> 5) Shenandoah devs need to consider what to do about this change: >>>> >>>> - unloading_occurred = >>>> SystemDictionary::do_unloading(heap->gc_timer()); >>>> + // FIXME: This turns off the previously broken JFR events. If we >>>> want to keep reporting them, >>>> + // but with the correct type (Concurrent) then a top-level >>>> concurrent phase is required. >>>> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer >>>> */); >>>> >>>> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to >>>> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log >>>> sub-phases, but the current Shenandoah code hasn't registered a >>>> top-level phase at this point. Either we keep this code with the >>>> removal of the gc_timer argument, or we add a top-level phase >>>> somewhere. If we want the latter, then I need suggestions on where to >>>> add them. Or maybe push the current code, and fix it as a follow-up >>>> patch? >>>> >>>> What do you think? An alternative is to (continue?) completely forbid >>>> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes >>>> during concurrent phases. Even if we decide to do that, I think >>>> there's some merit to the stricter GCTimer code, and the slight >>>> separation of concern in GCTraceTime. >>>> >>>> Tested tier1-3 >>>> >>>> Thanks, >>>> StefanK > From shade at redhat.com Mon Mar 30 12:23:11 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Mar 2020 14:23:11 +0200 Subject: RFR (XS) 8241838: Shenandoah: no need to trash cset during final mark Message-ID: <582542fb-34aa-50fe-fbb6-fe4e88ed6931@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241838 Follow up from CM-with-UR removal (JDK-8240868): we do not ever see cset during final mark now, so trashing the cset is effectively noop. Ditching this saves about 2-3 us during pause. trash_cset_regions() is still used during final-UR pause. Fix: diff -r e2418ac6ab12 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Mon Mar 30 13:31:43 2020 +0200 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Mon Mar 30 14:21:08 2020 +0200 @@ -1503,10 +1503,4 @@ } - // Trash the collection set left over from previous cycle, if any. - { - ShenandoahGCPhase phase(ShenandoahPhaseTimings::trash_cset); - trash_cset_regions(); - } - { ShenandoahGCPhase phase(ShenandoahPhaseTimings::prepare_evac); diff -r e2418ac6ab12 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Mon Mar 30 13:31:43 2020 +0200 +++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Mon Mar 30 14:21:08 2020 +0200 @@ -83,5 +83,4 @@ f(retire_tlabs, " Retire TLABs") \ f(sync_pinned, " Sync Pinned") \ - f(trash_cset, " Trash CSet") \ f(prepare_evac, " Prepare Evacuation") \ f(init_evac, " Initial Evacuation") \ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 30 12:39:42 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Mar 2020 14:39:42 +0200 Subject: RFR (XS) 8241838: Shenandoah: no need to trash cset during final mark In-Reply-To: <582542fb-34aa-50fe-fbb6-fe4e88ed6931@redhat.com> References: <582542fb-34aa-50fe-fbb6-fe4e88ed6931@redhat.com> Message-ID: Looks good, thank you! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241838 > > Follow up from CM-with-UR removal (JDK-8240868): we do not ever see cset during final mark now, so > trashing the cset is effectively noop. Ditching this saves about 2-3 us during pause. > trash_cset_regions() is still used during final-UR pause. > > Fix: > > diff -r e2418ac6ab12 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Mon Mar 30 13:31:43 2020 +0200 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Mon Mar 30 14:21:08 2020 +0200 > @@ -1503,10 +1503,4 @@ > } > > - // Trash the collection set left over from previous cycle, if any. > - { > - ShenandoahGCPhase phase(ShenandoahPhaseTimings::trash_cset); > - trash_cset_regions(); > - } > - > { > ShenandoahGCPhase phase(ShenandoahPhaseTimings::prepare_evac); > diff -r e2418ac6ab12 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Mon Mar 30 13:31:43 2020 +0200 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp Mon Mar 30 14:21:08 2020 +0200 > @@ -83,5 +83,4 @@ > f(retire_tlabs, " Retire TLABs") \ > f(sync_pinned, " Sync Pinned") \ > - f(trash_cset, " Trash CSet") \ > f(prepare_evac, " Prepare Evacuation") \ > f(init_evac, " Initial Evacuation") \ > > Testing: hotspot_gc_shenandoah > From shade at redhat.com Mon Mar 30 12:54:52 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Mar 2020 14:54:52 +0200 Subject: RFR (S) 8241841: Shenandoah: ditch one of allocation type counters in ShenandoahHeapRegion Message-ID: <904a50a9-db5c-16f8-d766-dd4e36034b7e@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241841 We currently count the allocation by type: TLAB, GCLAB, shared allocs. All together, they should add up to the "used" space in the region. That means we can ditch one of the counters, and infer it from the already tracked "used" size. "Shared" counter seems to be the most profitable to go: it usually means either a small allocation that does not need another small roadbump on allocation path, or the humongous allocation that does increments for every region in the humongous chain. This saves 4..8 bytes per region, and drops x86_32 size to 64 bytes (1 cache-line) without padding. Fix: https://cr.openjdk.java.net/~shade/8241841/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From shade at redhat.com Mon Mar 30 12:58:28 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Mar 2020 14:58:28 +0200 Subject: RFR (XS) 8241842: Shenandoah: inline ShenandoahHeapRegion::region_number Message-ID: <69c9a46c-3079-7a87-d019-730c99c6c069@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8241842 ShenandoahHeapRegion::region_number is used on a few hotpaths, and should be inlined. Fix: diff -r 8aa307793ffe src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Mon Mar 30 14:49:17 2020 +0200 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Mon Mar 30 14:57:23 2020 +0200 @@ -78,8 +78,4 @@ } -size_t ShenandoahHeapRegion::region_number() const { - return _region_number; -} - void ShenandoahHeapRegion::report_illegal_transition(const char *method) { ResourceMark rm; diff -r 8aa307793ffe src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp Mon Mar 30 14:49:17 2020 +0200 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp Mon Mar 30 14:57:23 2020 +0200 @@ -354,5 +354,7 @@ } - size_t region_number() const; + inline size_t region_number() const { + return _region_number; + } // Allocation (return NULL if full) Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From shade at redhat.com Mon Mar 30 13:10:03 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Mar 2020 15:10:03 +0200 Subject: RFR (S) 8241844: Shenandoah: rename ShenandoahHeapRegion::region_number Message-ID: RFR: https://bugs.openjdk.java.net/browse/JDK-8241844 ShenandoahHeapRegion::region_number is too verbose of the name, plus most of the code already treats it as "index" in the local variable. Can rename it for consistency. I am not sure whether it should be "idx" or "index". Candidate fix: https://cr.openjdk.java.net/~shade/8241844/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 30 13:25:10 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Mar 2020 15:25:10 +0200 Subject: RFR (S) 8241841: Shenandoah: ditch one of allocation type counters in ShenandoahHeapRegion In-Reply-To: <904a50a9-db5c-16f8-d766-dd4e36034b7e@redhat.com> References: <904a50a9-db5c-16f8-d766-dd4e36034b7e@redhat.com> Message-ID: <46f1314c-eaba-2b7f-a9d6-d5701125f305@redhat.com> Looks good. Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241841 > > We currently count the allocation by type: TLAB, GCLAB, shared allocs. All together, they should add > up to the "used" space in the region. That means we can ditch one of the counters, and infer it from > the already tracked "used" size. > > "Shared" counter seems to be the most profitable to go: it usually means either a small allocation > that does not need another small roadbump on allocation path, or the humongous allocation that does > increments for every region in the humongous chain. > > This saves 4..8 bytes per region, and drops x86_32 size to 64 bytes (1 cache-line) without padding. > > Fix: > https://cr.openjdk.java.net/~shade/8241841/webrev.01/ > > Testing: hotspot_gc_shenandoah > From rkennke at redhat.com Mon Mar 30 13:25:31 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Mar 2020 15:25:31 +0200 Subject: RFR (XS) 8241842: Shenandoah: inline ShenandoahHeapRegion::region_number In-Reply-To: <69c9a46c-3079-7a87-d019-730c99c6c069@redhat.com> References: <69c9a46c-3079-7a87-d019-730c99c6c069@redhat.com> Message-ID: Yup! Thanks, Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8241842 > > ShenandoahHeapRegion::region_number is used on a few hotpaths, and should be inlined. > > Fix: > > diff -r 8aa307793ffe src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Mon Mar 30 14:49:17 2020 +0200 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp Mon Mar 30 14:57:23 2020 +0200 > @@ -78,8 +78,4 @@ > } > > -size_t ShenandoahHeapRegion::region_number() const { > - return _region_number; > -} > - > void ShenandoahHeapRegion::report_illegal_transition(const char *method) { > ResourceMark rm; > diff -r 8aa307793ffe src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp Mon Mar 30 14:49:17 2020 +0200 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp Mon Mar 30 14:57:23 2020 +0200 > @@ -354,5 +354,7 @@ > } > > - size_t region_number() const; > + inline size_t region_number() const { > + return _region_number; > + } > > // Allocation (return NULL if full) > > Testing: hotspot_gc_shenandoah > From rkennke at redhat.com Mon Mar 30 13:26:28 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Mar 2020 15:26:28 +0200 Subject: RFR (S) 8241844: Shenandoah: rename ShenandoahHeapRegion::region_number In-Reply-To: References: Message-ID: <1daa1c9e-1f34-2698-01a2-fbc048ed1b78@redhat.com> > RFR: > https://bugs.openjdk.java.net/browse/JDK-8241844 > > ShenandoahHeapRegion::region_number is too verbose of the name, plus most of the code already treats > it as "index" in the local variable. Can rename it for consistency. I am not sure whether it should > be "idx" or "index". > > Candidate fix: > https://cr.openjdk.java.net/~shade/8241844/webrev.01/ > > Testing: hotspot_gc_shenandoah I'd prefer index, but if it's too much trouble to rename it, then leave it as idx. Roman From shade at redhat.com Mon Mar 30 13:38:43 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 30 Mar 2020 15:38:43 +0200 Subject: RFR (S) 8241844: Shenandoah: rename ShenandoahHeapRegion::region_number In-Reply-To: <1daa1c9e-1f34-2698-01a2-fbc048ed1b78@redhat.com> References: <1daa1c9e-1f34-2698-01a2-fbc048ed1b78@redhat.com> Message-ID: <3394c605-c9a8-5af2-184c-619acc0dae29@redhat.com> On 3/30/20 3:26 PM, Roman Kennke wrote: >> RFR: >> https://bugs.openjdk.java.net/browse/JDK-8241844 >> >> ShenandoahHeapRegion::region_number is too verbose of the name, plus most of the code already treats >> it as "index" in the local variable. Can rename it for consistency. I am not sure whether it should >> be "idx" or "index". >> >> Candidate fix: >> https://cr.openjdk.java.net/~shade/8241844/webrev.01/ >> >> Testing: hotspot_gc_shenandoah > > I'd prefer index, but if it's too much trouble to rename it, then leave > it as idx. No bother until this is committed. Renamed to "index" like this: https://cr.openjdk.java.net/~shade/8241844/webrev.01/ Better? -- Thanks, -Aleksey From rkennke at redhat.com Mon Mar 30 13:40:22 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 30 Mar 2020 15:40:22 +0200 Subject: RFR (S) 8241844: Shenandoah: rename ShenandoahHeapRegion::region_number In-Reply-To: <3394c605-c9a8-5af2-184c-619acc0dae29@redhat.com> References: <1daa1c9e-1f34-2698-01a2-fbc048ed1b78@redhat.com> <3394c605-c9a8-5af2-184c-619acc0dae29@redhat.com> Message-ID: >>> RFR: >>> https://bugs.openjdk.java.net/browse/JDK-8241844 >>> >>> ShenandoahHeapRegion::region_number is too verbose of the name, plus most of the code already treats >>> it as "index" in the local variable. Can rename it for consistency. I am not sure whether it should >>> be "idx" or "index". >>> >>> Candidate fix: >>> https://cr.openjdk.java.net/~shade/8241844/webrev.01/ >>> >>> Testing: hotspot_gc_shenandoah >> >> I'd prefer index, but if it's too much trouble to rename it, then leave >> it as idx. > > No bother until this is committed. Renamed to "index" like this: > https://cr.openjdk.java.net/~shade/8241844/webrev.01/ > > Better? It's: https://cr.openjdk.java.net/~shade/8241844/webrev.02/ Yes, this looks better to me. Thank you! Roman From claes.redestad at oracle.com Mon Mar 30 14:23:35 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 30 Mar 2020 16:23:35 +0200 Subject: RFR: 8241830: Simplify commit error messages in G1PageBasedVirtualSpace Message-ID: Hi, when committing memory for virtual space, we eagerly generate an error message using err_msg, which ends up malloc'ing some memory. As this is done for each potential heap region, this turns out to have a significant cost on startup when ergonomics decide we should run with many regions. Since the address range information is redundant (a warning will be printed along with the hs_err file with similar detail), I propose replacing with a static error message. This aligns with other call sites. For unrecoverable mmap failures, the message will be ignored and replaced by "committing reserved memory", meaning the extra information is unlikely to actually manifest. Webrev: http://cr.openjdk.java.net/~redestad/8241830/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8241830 Thanks! /Claes From stefan.johansson at oracle.com Mon Mar 30 14:26:09 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 30 Mar 2020 16:26:09 +0200 Subject: RFR: 8241830: Simplify commit error messages in G1PageBasedVirtualSpace In-Reply-To: References: Message-ID: <3a5a0a42-9760-5cbc-bc60-95c9ab9d69a8@oracle.com> Hi, On 2020-03-30 16:23, Claes Redestad wrote: > Hi, > > when committing memory for virtual space, we eagerly generate an error > message using err_msg, which ends up malloc'ing some memory. As this is > done for each potential heap region, this turns out to have a > significant cost on startup when ergonomics decide we should run with > many regions. > > Since the address range information is redundant (a warning will be > printed along with the hs_err file with similar detail), I propose > replacing with a static error message. This aligns with other call > sites. For unrecoverable mmap failures, the message will be ignored and > replaced by "committing reserved memory", meaning the extra information > is unlikely to actually manifest. > > Webrev: http://cr.openjdk.java.net/~redestad/8241830/open.00/ This looks good to me, thanks for fixing. StefanJ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8241830 > > Thanks! > > /Claes From claes.redestad at oracle.com Mon Mar 30 15:03:05 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 30 Mar 2020 17:03:05 +0200 Subject: RFR: 8241830: Simplify commit error messages in G1PageBasedVirtualSpace In-Reply-To: <3a5a0a42-9760-5cbc-bc60-95c9ab9d69a8@oracle.com> References: <3a5a0a42-9760-5cbc-bc60-95c9ab9d69a8@oracle.com> Message-ID: <2bd7df92-b3ed-8864-0e4f-c211426471dc@oracle.com> On 2020-03-30 16:26, Stefan Johansson wrote: > Hi, > > On 2020-03-30 16:23, Claes Redestad wrote: >> >> Webrev: http://cr.openjdk.java.net/~redestad/8241830/open.00/ > This looks good to me, thanks for fixing. > > StefanJ > Thanks! /Claes From stefan.karlsson at oracle.com Mon Mar 30 15:32:28 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 30 Mar 2020 17:32:28 +0200 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> Message-ID: <66f71efc-f9e1-0c62-f915-7dbc8beae263@oracle.com> Updated webrevs: https://cr.openjdk.java.net/~stefank/8241361/webrev.04.delta/ https://cr.openjdk.java.net/~stefank/8241361/webrev.04/ Changes after some testing and discussions with Per: 1) For some reason it's required by JFR testing that the .jfc files list the stackTrace value. Fixed that. 2) Only generate ZUncommit when we actually have uncommitted memory. 3) Send relocation set group events for large pages. 4) Turn off both logging and event generation when medium pages have been disabled. Thanks, StefanK On 2020-03-20 14:43, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add some memory related JFR events to ZGC. > > https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8241361 > > Added events: > > ZAllocationStall - Record when we run out of heap memory and the Java > threads stall, waiting for the GC to free up memory. > > ZPageAllocation - Updated the existing event to also record the duration > of the event. Updated the event to only be reported if the allocation > takes longer than 1 ms. > > ZPageCacheFlush - Record when the page cache needs to be flushed. This > usually happens when we run out of a specific page size and have to > detach the physical and virtual memory to materialize a new ZPage. We > also flush pages when we uncommit memory. > > ZRelocationSet - Record information about the selected relocation set. > > ZUncommit - Record when we uncommit and hand back memory to the OS. > > The patch also contains some small cosmetic changes to existing events, > whitespace fixes. From leo.korinth at oracle.com Mon Mar 30 15:50:33 2020 From: leo.korinth at oracle.com (Leo Korinth) Date: Mon, 30 Mar 2020 17:50:33 +0200 Subject: RFR: 8241830: Simplify commit error messages in G1PageBasedVirtualSpace In-Reply-To: References: Message-ID: <918d6a47-2cb6-00ad-1400-e289b78614b3@oracle.com> On 30/03/2020 16:23, Claes Redestad wrote: > Hi, > > when committing memory for virtual space, we eagerly generate an error > message using err_msg, which ends up malloc'ing some memory. As this is > done for each potential heap region, this turns out to have a > significant cost on startup when ergonomics decide we should run with > many regions. Nice to get a measurable improvement even though no malloc actually seems to be called. The code also got a bit simpler! > Since the address range information is redundant (a warning will be > printed along with the hs_err file with similar detail), I propose > replacing with a static error message. This aligns with other call > sites. For unrecoverable mmap failures, the message will be ignored and > replaced by "committing reserved memory", meaning the extra information > is unlikely to actually manifest. Maybe you can differentiate the the static string so that we can see which of the two functions failed? Either way, the change looks good to me and if you choose to change the static strings, I need no new webrev. Thanks, Leo > Webrev: http://cr.openjdk.java.net/~redestad/8241830/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8241830 > > Thanks! > > /Claes From per.liden at oracle.com Mon Mar 30 15:55:54 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 30 Mar 2020 17:55:54 +0200 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <66f71efc-f9e1-0c62-f915-7dbc8beae263@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <66f71efc-f9e1-0c62-f915-7dbc8beae263@oracle.com> Message-ID: On 3/30/20 5:32 PM, Stefan Karlsson wrote: > Updated webrevs: > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.04.delta/ > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.04/ > > Changes after some testing and discussions with Per: > > 1) For some reason it's required by JFR testing that the .jfc files list > the stackTrace value. Fixed that. > 2) Only generate ZUncommit when we actually have uncommitted memory. > 3) Send relocation set group events for large pages. > 4) Turn off both logging and event generation when medium pages have > been disabled. Looks good! /Per > > Thanks, > StefanK > > On 2020-03-20 14:43, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add some memory related JFR events to ZGC. >> >> https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241361 >> >> Added events: >> >> ZAllocationStall - Record when we run out of heap memory and the Java >> threads stall, waiting for the GC to free up memory. >> >> ZPageAllocation - Updated the existing event to also record the >> duration of the event. Updated the event to only be reported if the >> allocation takes longer than 1 ms. >> >> ZPageCacheFlush - Record when the page cache needs to be flushed. This >> usually happens when we run out of a specific page size and have to >> detach the physical and virtual memory to materialize a new ZPage. We >> also flush pages when we uncommit memory. >> >> ZRelocationSet - Record information about the selected relocation set. >> >> ZUncommit - Record when we uncommit and hand back memory to the OS. >> >> The patch also contains some small cosmetic changes to existing >> events, whitespace fixes. From claes.redestad at oracle.com Mon Mar 30 16:02:03 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 30 Mar 2020 18:02:03 +0200 Subject: RFR: 8241830: Simplify commit error messages in G1PageBasedVirtualSpace In-Reply-To: <918d6a47-2cb6-00ad-1400-e289b78614b3@oracle.com> References: <918d6a47-2cb6-00ad-1400-e289b78614b3@oracle.com> Message-ID: <07150ddd-5584-3104-a6f1-269845bd136e@oracle.com> On 2020-03-30 17:50, Leo Korinth wrote: > > > On 30/03/2020 16:23, Claes Redestad wrote: >> Hi, >> >> when committing memory for virtual space, we eagerly generate an error >> message using err_msg, which ends up malloc'ing some memory. As this is >> done for each potential heap region, this turns out to have a >> significant cost on startup when ergonomics decide we should run with >> many regions. > > Nice to get a measurable improvement even though no malloc actually > seems to be called. The code also got a bit simpler! Right, I was about to send out a correction that FormatBuffer doesn't malloc but stack allocates a (large) fixed buffer. I confused the impl details with stringStream (which I'm looking at for a few other, unrelated cleanups..). > >> Since the address range information is redundant (a warning will be >> printed along with the hs_err file with similar detail), I propose >> replacing with a static error message. This aligns with other call >> sites. For unrecoverable mmap failures, the message will be ignored and >> replaced by "committing reserved memory", meaning the extra information >> is unlikely to actually manifest. > > Maybe you can differentiate the the static string so that we can see > which of the two functions failed? Either way, the change looks good to > me and if you choose to change the static strings, I need no new webrev. Ok, thanks! /Claes From erik.osterlund at oracle.com Mon Mar 30 16:30:01 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 30 Mar 2020 18:30:01 +0200 Subject: RFR: 8241361: ZGC: Implement memory related JFR events In-Reply-To: <66f71efc-f9e1-0c62-f915-7dbc8beae263@oracle.com> References: <40e29fc8-005a-e5d1-8bf0-816d406ee7b8@oracle.com> <66f71efc-f9e1-0c62-f915-7dbc8beae263@oracle.com> Message-ID: <0702ca36-9d5c-b2ff-dd0c-abba92998fc3@oracle.com> Hi Stefan, Still good. Thanks, /Erik On 2020-03-30 17:32, Stefan Karlsson wrote: > Updated webrevs: > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.04.delta/ > ?https://cr.openjdk.java.net/~stefank/8241361/webrev.04/ > > Changes after some testing and discussions with Per: > > 1) For some reason it's required by JFR testing that the .jfc files > list the stackTrace value. Fixed that. > 2) Only generate ZUncommit when we actually have uncommitted memory. > 3) Send relocation set group events for large pages. > 4) Turn off both logging and event generation when medium pages have > been disabled. > > Thanks, > StefanK > > On 2020-03-20 14:43, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add some memory related JFR events to ZGC. >> >> https://cr.openjdk.java.net/~stefank/8241361/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8241361 >> >> Added events: >> >> ZAllocationStall - Record when we run out of heap memory and the Java >> threads stall, waiting for the GC to free up memory. >> >> ZPageAllocation - Updated the existing event to also record the >> duration of the event. Updated the event to only be reported if the >> allocation takes longer than 1 ms. >> >> ZPageCacheFlush - Record when the page cache needs to be flushed. >> This usually happens when we run out of a specific page size and have >> to detach the physical and virtual memory to materialize a new ZPage. >> We also flush pages when we uncommit memory. >> >> ZRelocationSet - Record information about the selected relocation set. >> >> ZUncommit - Record when we uncommit and hand back memory to the OS. >> >> The patch also contains some small cosmetic changes to existing >> events, whitespace fixes. From mikhailo.seledtsov at oracle.com Mon Mar 30 18:10:37 2020 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Mon, 30 Mar 2020 11:10:37 -0700 Subject: RFR(S) : 8203238: [TESTBUG] rewrite MemOptions shell test in Java In-Reply-To: References: <6B89C20B-36D8-4743-979B-56DDF8ADCE64@oracle.com> Message-ID: <79ddf648-59b6-641d-ade3-5c64eef162e9@oracle.com> Looks good to me, with one comment: The comment to the test states: "It is intended to be run on machines with more than 4G available memory". I would then recommend using "@requires os.maxMemory > 4G" The rest looks good to me. However, I am not an expert in GC. Perhaps, someone from GC team could review it as well. Misha On 3/29/20 9:07 AM, Igor Ignatev wrote: > Ping? > > ? Igor > >> On Mar 25, 2020, at 10:42 AM, Igor Ignatyev wrote: >> >> ?http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 >>> 330 lines changed: 91 ins; 236 del; 3 mod; >> Hi all, >> >> could you please review this small patch which rewrites MemOptions shell test? >> >> while porting the test, I noticed that available memory checks aren't required, and the test successfully passes even w/o them, so the java version of the test doesn't check available memory and only @requires 64 bits vm. given the test doesn't require lots of time/resources to execute, I've also removed it from exclusiveAccess. MemStat class was made static inner class of MemOptionsTest for the sake of readability and brevity. >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8203238/webrev.00 >> testing: the changed tests multiple tests on {linux, windows, mac} w/ {SerialGC,ZGC,G1GC,ParallelGC} >> JBS: https://bugs.openjdk.java.net/browse/JDK-8203238 >> >> NB the shell version of the test had a bug which prevent its execution. an incorrect operator (:=) was used at L#23,23, which led to bogus 'java' variable at L#44 and non zero exit code at L#48, so the test passes w/ 'Skipping the test; a 64-bit VM is required.' message on all platforms. so this patch effectively resurrects the test. >> >> Thanks, >> -- Igor >> From per.liden at oracle.com Mon Mar 30 20:38:52 2020 From: per.liden at oracle.com (Per Liden) Date: Mon, 30 Mar 2020 22:38:52 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events Message-ID: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> Add tests for the newly added ZGC-specific JFR events that we intend to make non-experimental as part of JEP 377. These events are: ZAllocationStall ZPageAllocation ZPageCacheFlush ZRelocationSet ZRelocationSetGroup ZUncommit Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 /Per From stefan.karlsson at oracle.com Tue Mar 31 09:22:40 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 31 Mar 2020 11:22:40 +0200 Subject: RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events In-Reply-To: References: <8dd49f70-19f3-cc6f-58fc-99af87cc31f5@oracle.com> <31bcbdf5-6b20-502d-5f91-8bd18962985d@redhat.com> Message-ID: No answers from Shenandoah dev. I'll go ahead with the proposed change for Shenanodah. It's easy enough to change, and it doesn't make the situation worse than it is to day. StefanK On 2020-03-30 13:12, Stefan Karlsson wrote: > Hi again, > > Did you find the time to take a look at this? > > I'd like to propose that we go with the current solution to disable the > incorrect reporting of the events for now, until you find the time to > look at this. The effect of this will be that you won't get this > sub-phase reported, but at the same time it do remove the bug that a > pause was reported. > > Thanks, > StefanK > > On 2020-03-26 10:55, Stefan Karlsson wrote: >> On 2020-03-26 10:54, Roman Kennke wrote: >>> Hey Stefan, >>> >>> Sorry, this went under my radar. Give us half a day or so, yes? >> Sure. >> >> StefanK >> >>> >>> Thanks, >>> Roman >>> >>>> Shenandoah devs, any comments w.r.t. to the Shenandoah section below? >>>> >>>> Thanks, >>>> StefanK >>>> >>>> On 2020-03-19 10:44, Stefan Karlsson wrote: >>>>> Hi all, >>>>> >>>>> Please review this patch to rewrite the GCTimer, and associated >>>>> classes, to not allow nested phases of different types (pause or >>>>> concurrent). >>>>> >>>>> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8241160 >>>>> >>>>> A bug was found when I was looking at JFR events from ZGC. A >>>>> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The >>>>> only valid parent is a GCPhasePause event. The reason why this >>>>> happened was that the we use a GCTraceTime class inside the class >>>>> unloading code. Previously, we only used GCTraceTimes inside pauses, >>>>> but ever since class unloading was moved out to a concurrent phase, >>>>> this isn't true anymore. GCTraceTime used >>>>> GCTimer::register_gc_phase_start(name, Ticks, phase? = ), and >>>>> therefore always reported pauses and pause sub-phases. >>>>> >>>>> With this patch, I suggest that we become stricter in our usages of >>>>> the GCTimer. The effects of the patch are: >>>>> >>>>> 1) When a top-level pause (or concurrent) phase is created, the code >>>>> must be explicit about what type of phase is created. The code will >>>>> now assert if this is abused. Most places were already explicit, but I >>>>> had to change two places: >>>>> >>>>> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have >>>>> access to register_gc_pause_start. I made that function public, >>>>> instead of protected, so that we didn't have to deal with that >>>>> problem. >>>>> >>>>> b) G1 used GCTraceTime to note the Remark/Cleanup? pauses (in >>>>> VM_G1Concurrent). This is the only place that uses GCTraceTime to >>>>> start a pause. All other places use GCTraceTime to create sub-phases. >>>>> I could have copy-n-pasted the entire >>>>> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and >>>>> create a version that calls register_gc_pause_start instead of >>>>> register_gc_phase_start. Instead of doing that I opted for creating a >>>>> system where the code code register a set of callbacks to be called >>>>> when the start and end time is registered. This is used in the backend >>>>> of GCTraceTime, but then also used by G1 to allow us to not have to >>>>> copy-n-paste a lot of the code. >>>>> >>>>> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic >>>>> to the default callbacks (unfied logging and GCTimer) but couldn't >>>>> find a nice way to express that, because of the way we macro-expand >>>>> the UL tags. Maybe something we can consider for a future >>>>> investigation. >>>>> >>>>> 2) sub-phases now inherit the type from the parent phase, and there's >>>>> no possibility to incorrectly nest phases anymore. This also removed >>>>> the need for ConcurrentGCTimer::_is_concurrent_phase_active. >>>>> >>>>> 3) This allows (and encourages concurrent sub-phases). When the JFR >>>>> events were ported to HotSpot, only pauses got sub-phases, because >>>>> there wasn't a big need for concurrent sub-phases. In this patch I >>>>> added level of sub-phases to JFR. Maybe it would be better to add more >>>>> right away? (I'm not a fan of having the explicit sub-phase level >>>>> events, instead of a counter in *the* phase event, but the JMC team at >>>>> that time needed it to be logged as separate events. Maybe something >>>>> that could be reconsidered some time) >>>>> >>>>> 4) The different consumers of the timestamps are separated into their >>>>> own classes. >>>>> >>>>> 5) Shenandoah devs need to consider what to do about this change: >>>>> >>>>> - unloading_occurred = >>>>> SystemDictionary::do_unloading(heap->gc_timer()); >>>>> + // FIXME: This turns off the previously broken JFR events. If we >>>>> want to keep reporting them, >>>>> + // but with the correct type (Concurrent) then a top-level >>>>> concurrent phase is required. >>>>> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer >>>>> */); >>>>> >>>>> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to >>>>> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log >>>>> sub-phases, but the current Shenandoah code hasn't registered a >>>>> top-level phase at this point. Either we keep this code with the >>>>> removal of the gc_timer argument, or we add a top-level phase >>>>> somewhere. If we want the latter, then I need suggestions on where to >>>>> add them. Or maybe push the current code, and fix it as a follow-up >>>>> patch? >>>>> >>>>> What do you think? An alternative is to (continue?) completely forbid >>>>> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes >>>>> during concurrent phases. Even if we decide to do that, I think >>>>> there's some merit to the stricter GCTimer code, and the slight >>>>> separation of concern in GCTraceTime. >>>>> >>>>> Tested tier1-3 >>>>> >>>>> Thanks, >>>>> StefanK >> > From stefan.johansson at oracle.com Tue Mar 31 09:42:28 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 31 Mar 2020 11:42:28 +0200 Subject: RFR: 8241670: Enhance heap region size ergonomics to improve OOTB performance Message-ID: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> Hi, Please review this enhancement to improve the out of the box performance of G1. Webrev: http://cr.openjdk.java.net/~sjohanss/8241670/00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8241670 Summary The default heap region size determined at startup used the initial and max heap size to calculate a region size so that the heap would have at least 2048 regions (if possible). This proposed patch will change this to: 1) Only consider Max to make it easy to explain and avoid strange situations where -Xms or the lack of it will cause different region size for the same max heap size. 2) Round up the region size to next power of 2, since we've seen many cases where a larger region size is beneficial. 3) Keep the 2048 target for now since the other two changes, will have good effect on choosing a larger region size for heaps above 2G. Testing Mach5 tier1-4, aurora performance run for sanity and manual performance testing to verify results. Thanks, Stefan From claes.redestad at oracle.com Tue Mar 31 10:04:48 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 31 Mar 2020 12:04:48 +0200 Subject: RFR: 8241670: Enhance heap region size ergonomics to improve OOTB performance In-Reply-To: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> References: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> Message-ID: <973ec8e3-0b33-c14c-482a-c9a37cfce25d@oracle.com> Hi, looks great - both the patch and the out-of-the-box performance improvements. Thanks! /Claes On 2020-03-31 11:42, Stefan Johansson wrote: > Hi, > > Please review this enhancement to improve the out of the box performance > of G1. > > Webrev: http://cr.openjdk.java.net/~sjohanss/8241670/00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8241670 > > Summary > The default heap region size determined at startup used the initial and > max heap size to calculate a region size so that the heap would have at > least 2048 regions (if possible). This proposed patch will change this to: > 1) Only consider Max to make it easy to explain and avoid strange > situations where -Xms or the lack of it will cause different region size > for the same max heap size. > 2) Round up the region size to next power of 2, since we've seen many > cases where a larger region size is beneficial. > 3) Keep the 2048 target for now since the other two changes, will have > good effect on choosing a larger region size for heaps above 2G. > > Testing > Mach5 tier1-4, aurora performance run for sanity and manual performance > testing to verify results. > > Thanks, > Stefan From thomas.schatzl at oracle.com Tue Mar 31 10:49:28 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 31 Mar 2020 12:49:28 +0200 Subject: RFR: 8241670: Enhance heap region size ergonomics to improve OOTB performance In-Reply-To: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> References: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> Message-ID: <230f4f30-cf37-1d93-467e-2b6ff4e1696b@oracle.com> Hi, On 31.03.20 11:42, Stefan Johansson wrote: > Hi, > > Please review this enhancement to improve the out of the box performance > of G1. > > Webrev: http://cr.openjdk.java.net/~sjohanss/8241670/00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8241670 > > Summary > The default heap region size determined at startup used the initial and > max heap size to calculate a region size so that the heap would have at > least 2048 regions (if possible). This proposed patch will change this to: > 1) Only consider Max to make it easy to explain and avoid strange > situations where -Xms or the lack of it will cause different region size > for the same max heap size. > 2) Round up the region size to next power of 2, since we've seen many > cases where a larger region size is beneficial. > 3) Keep the 2048 target for now since the other two changes, will have > good effect on choosing a larger region size for heaps above 2G. > > Testing > Mach5 tier1-4, aurora performance run for sanity and manual performance > testing to verify results. > - heapRegion.cpp: s/benificial/beneficial - heapRegionBounds.hpp: Maybe remove the "(based on the max heap size)." comment part. Apparently we forgot to update that last time, so we probably will next time too. The code is simple enough too. - maybe update copyrights while you are at it. No need to re-review these comment updates. Looks good. Thanks, Thomas From stefan.johansson at oracle.com Tue Mar 31 13:57:12 2020 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 31 Mar 2020 15:57:12 +0200 Subject: RFR: 8241670: Enhance heap region size ergonomics to improve OOTB performance In-Reply-To: <230f4f30-cf37-1d93-467e-2b6ff4e1696b@oracle.com> References: <76b10f27-6ac3-8adc-84c7-d71eda2a112c@oracle.com> <230f4f30-cf37-1d93-467e-2b6ff4e1696b@oracle.com> Message-ID: Thanks for the reviews, Will update per your suggestions before pushing. Cheers, Stefan On 2020-03-31 12:49, Thomas Schatzl wrote: > Hi, > > On 31.03.20 11:42, Stefan Johansson wrote: >> Hi, >> >> Please review this enhancement to improve the out of the box >> performance of G1. >> >> Webrev: http://cr.openjdk.java.net/~sjohanss/8241670/00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8241670 >> >> Summary >> The default heap region size determined at startup used the initial >> and max heap size to calculate a region size so that the heap would >> have at least 2048 regions (if possible). This proposed patch will >> change this to: >> 1) Only consider Max to make it easy to explain and avoid strange >> situations where -Xms or the lack of it will cause different region >> size for the same max heap size. >> 2) Round up the region size to next power of 2, since we've seen many >> cases where a larger region size is beneficial. >> 3) Keep the 2048 target for now since the other two changes, will have >> good effect on choosing a larger region size for heaps above 2G. >> >> Testing >> Mach5 tier1-4, aurora performance run for sanity and manual >> performance testing to verify results. >> > > - heapRegion.cpp: s/benificial/beneficial > > - heapRegionBounds.hpp: Maybe remove the "(based on the max heap size)." > comment part. Apparently we forgot to update that last time, so we > probably will next time too. The code is simple enough too. > > - maybe update copyrights while you are at it. > > No need to re-review these comment updates. > > Looks good. > > Thanks, > ? Thomas From stefan.karlsson at oracle.com Tue Mar 31 14:32:51 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 31 Mar 2020 16:32:51 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events In-Reply-To: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> References: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> Message-ID: <217eb50e-c5be-aa42-f73a-ad55e66722e5@oracle.com> Looks good. StefanK On 2020-03-30 22:38, Per Liden wrote: > Add tests for the newly added ZGC-specific JFR events that we intend to > make non-experimental as part of JEP 377. These events are: > > ZAllocationStall > ZPageAllocation > ZPageCacheFlush > ZRelocationSet > ZRelocationSetGroup > ZUncommit > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 > Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 > > /Per From per.liden at oracle.com Tue Mar 31 14:33:24 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 31 Mar 2020 16:33:24 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events In-Reply-To: <217eb50e-c5be-aa42-f73a-ad55e66722e5@oracle.com> References: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> <217eb50e-c5be-aa42-f73a-ad55e66722e5@oracle.com> Message-ID: <301dad81-8075-6a1f-8010-532fa7e34512@oracle.com> Thanks Stefan! /Per On 3/31/20 4:32 PM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2020-03-30 22:38, Per Liden wrote: >> Add tests for the newly added ZGC-specific JFR events that we intend >> to make non-experimental as part of JEP 377. These events are: >> >> ZAllocationStall >> ZPageAllocation >> ZPageCacheFlush >> ZRelocationSet >> ZRelocationSetGroup >> ZUncommit >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 >> Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 >> >> /Per From erik.osterlund at oracle.com Tue Mar 31 14:38:26 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 31 Mar 2020 16:38:26 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events In-Reply-To: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> References: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> Message-ID: Hi, Looks good. Thanks, /Erik On 2020-03-30 22:38, Per Liden wrote: > Add tests for the newly added ZGC-specific JFR events that we intend > to make non-experimental as part of JEP 377. These events are: > > ZAllocationStall > ZPageAllocation > ZPageCacheFlush > ZRelocationSet > ZRelocationSetGroup > ZUncommit > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 > Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 > > /Per From erik.gahlin at oracle.com Tue Mar 31 14:46:04 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Tue, 31 Mar 2020 16:46:04 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events In-Reply-To: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> References: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> Message-ID: Hi Per, Would it be possible to verify the values in the event? I don?t understand the semantics of the events, so it is hard for me to suggest what should actually be verified, but some sort of sanity check seems reasonable. For example, can capacityBefore be less than capacityAfter, or vice versa, in ZUncommit. Thanks Erik > On 30 Mar 2020, at 22:38, Per Liden wrote: > > Add tests for the newly added ZGC-specific JFR events that we intend to make non-experimental as part of JEP 377. These events are: > > ZAllocationStall > ZPageAllocation > ZPageCacheFlush > ZRelocationSet > ZRelocationSetGroup > ZUncommit > > Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 > Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 > > /Per From per.liden at oracle.com Tue Mar 31 20:31:49 2020 From: per.liden at oracle.com (Per Liden) Date: Tue, 31 Mar 2020 22:31:49 +0200 Subject: RFR: 8241881: ZGC: Add tests for JFR events In-Reply-To: References: <215b1b2f-4195-e883-0243-ac444e3d7525@oracle.com> Message-ID: Thanks Erik! /Per On 3/31/20 4:38 PM, Erik ?sterlund wrote: > Hi, > > Looks good. > > Thanks, > /Erik > > On 2020-03-30 22:38, Per Liden wrote: >> Add tests for the newly added ZGC-specific JFR events that we intend >> to make non-experimental as part of JEP 377. These events are: >> >> ZAllocationStall >> ZPageAllocation >> ZPageCacheFlush >> ZRelocationSet >> ZRelocationSetGroup >> ZUncommit >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8241881 >> Webrev: http://cr.openjdk.java.net/~pliden/8241881/webrev.0 >> >> /Per >