From per.liden at oracle.com Tue Oct 1 09:10:21 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 1 Oct 2019 11:10:21 +0200 Subject: Number of concurrent threads with ZGC In-Reply-To: References: Message-ID: Hi, The number of ZWorkers started is max(ParallelGCThreads, ConcGCThreads). This set of worker threads is then used for both stop-the-world operations and for concurrent operations. But if ConcGCThreads < ParallelGCThreads, then only a subset of the ZWorkers are used in concurrent GC operations. The number of ZRuntimeWorkers started is ParallelGCThreads. These threads are not really involved in GC work. Instead they help out doing various safepoint cleanup tasks (deflate monitors, cleans various data structures, etc). If you use -Xlog:gc+init (or -Xlog:gc*), then ZGC will at startup print number of threads configured for various tasks. cheers, Per On 9/30/19 8:12 PM, Sundara Mohan M wrote: > Hi, > When i configure my concurrent gc thread count as 5 using > -XX:ConcGCThreads=5. > I still see 40 threads (thread named ZWorkers, 40 is my cpu count) are > running in jvm. > Also i see 40 RuntimeWorkers thread running (assuming this is for > concurrent processing). > > Assuming "ZWorkers" thread for concurrent processing and "RuntimeWorkers" > for parallel processing or stw phase threads. Looks this assumption is not > correct. > > Can you help me understand why we create more than 5 "ZWorkers" if i > configure my concurrent thread count is 5. > > TIA > Sundar > From m.sundar85 at gmail.com Tue Oct 1 19:39:18 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 1 Oct 2019 12:39:18 -0700 Subject: Number of concurrent threads with ZGC In-Reply-To: References: Message-ID: Thank you for the clarification. Regards, Sundar On Tue, Oct 1, 2019 at 2:10 AM Per Liden wrote: > Hi, > > The number of ZWorkers started is max(ParallelGCThreads, ConcGCThreads). > This set of worker threads is then used for both stop-the-world > operations and for concurrent operations. But if ConcGCThreads < > ParallelGCThreads, then only a subset of the ZWorkers are used in > concurrent GC operations. > > The number of ZRuntimeWorkers started is ParallelGCThreads. These > threads are not really involved in GC work. Instead they help out doing > various safepoint cleanup tasks (deflate monitors, cleans various data > structures, etc). > > If you use -Xlog:gc+init (or -Xlog:gc*), then ZGC will at startup print > number of threads configured for various tasks. > > cheers, > Per > > On 9/30/19 8:12 PM, Sundara Mohan M wrote: > > Hi, > > When i configure my concurrent gc thread count as 5 using > > -XX:ConcGCThreads=5. > > I still see 40 threads (thread named ZWorkers, 40 is my cpu count) are > > running in jvm. > > Also i see 40 RuntimeWorkers thread running (assuming this is for > > concurrent processing). > > > > Assuming "ZWorkers" thread for concurrent processing and "RuntimeWorkers" > > for parallel processing or stw phase threads. Looks this assumption is > not > > correct. > > > > Can you help me understand why we create more than 5 "ZWorkers" if i > > configure my concurrent thread count is 5. > > > > TIA > > Sundar > > > From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 10:19:06 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 12:19:06 +0200 Subject: Possible working method to get actual process size on Linux Message-ID: I'm still trying to get Linux to report a correct process size when using ZGC (the memory multi-mapping issue). My idea is to parse /proc/pid/smaps. Sadly, I can't see physical addresses there, only virtual ones. So I group by virtual address range. Here's what I got for an example process with is reported by top as 2.7 GB: Address range 000000xxxxxxxxxx: 6 MB Address range 000004xxxxxxxxxx: 38 MB Address range 000007xxxxxxxxxx: 646 MB Address range 000008xxxxxxxxxx: 38 MB Address range 00000bxxxxxxxxxx: 631 MB Address range 000010xxxxxxxxxx: 38 MB Address range 000013xxxxxxxxxx: 690 MB Address range 00007fxxxxxxxxxx: 726 MB It appears I have to discount the 07, 0b and 13 ranges to get to a reasonable actual process size of 844 MB. Question: Are these address ranges fixed or does ZGC choose different ones depending on heap size? Where exactly do the address ranges begin and end? As soon as this reporting method works, let's publish it as a standard tool for anyone using ZGC on Linux. I can't be the only one who's driven nuts by not knowing how big my processes are. Greetings, Stefan -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 10:21:24 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 12:21:24 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: Message-ID: Addendum - ok, quite obviously the ranges are 000004-000007 etc. (see those suspicious repeating 38 MB value). So the only question remains if ZGC always uses these ranges. (What if the heap gets bigger than that?) On Thu, 3 Oct 2019 at 12:19, Stefan Reich < stefan.reich.maker.of.eye at googlemail.com> wrote: > I'm still trying to get Linux to report a correct process size when using > ZGC (the memory multi-mapping issue). > > My idea is to parse /proc/pid/smaps. Sadly, I can't see physical addresses > there, only virtual ones. So I group by virtual address range. Here's what > I got for an example process with is reported by top as 2.7 GB: > > Address range 000000xxxxxxxxxx: 6 MB > Address range 000004xxxxxxxxxx: 38 MB > Address range 000007xxxxxxxxxx: 646 MB > Address range 000008xxxxxxxxxx: 38 MB > Address range 00000bxxxxxxxxxx: 631 MB > Address range 000010xxxxxxxxxx: 38 MB > Address range 000013xxxxxxxxxx: 690 MB > Address range 00007fxxxxxxxxxx: 726 MB > > It appears I have to discount the 07, 0b and 13 ranges to get to a > reasonable actual process size of 844 MB. > > Question: Are these address ranges fixed or does ZGC choose different ones > depending on heap size? Where exactly do the address ranges begin and end? > > As soon as this reporting method works, let's publish it as a standard > tool for anyone using ZGC on Linux. I can't be the only one who's driven > nuts by not knowing how big my processes are. > > Greetings, > Stefan > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 10:30:06 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 12:30:06 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: Message-ID: Cleaned up address ranges: Address range 000000xxxxxxxxxx: 6 MB Address range 000004xxxxxxxxxx: 701 MB Address range 000008xxxxxxxxxx: 691 MB Address range 000010xxxxxxxxxx: 744 MB Address range 00007cxxxxxxxxxx: 720 MB So the final question is which 3 values of the 4 to discard - they're all slightly different as you can see... On Thu, 3 Oct 2019 at 12:21, Stefan Reich < stefan.reich.maker.of.eye at googlemail.com> wrote: > Addendum - ok, quite obviously the ranges are 000004-000007 etc. (see > those suspicious repeating 38 MB value). > > So the only question remains if ZGC always uses these ranges. (What if the > heap gets bigger than that?) > > On Thu, 3 Oct 2019 at 12:19, Stefan Reich < > stefan.reich.maker.of.eye at googlemail.com> wrote: > >> I'm still trying to get Linux to report a correct process size when using >> ZGC (the memory multi-mapping issue). >> >> My idea is to parse /proc/pid/smaps. Sadly, I can't see physical >> addresses there, only virtual ones. So I group by virtual address range. >> Here's what I got for an example process with is reported by top as 2.7 GB: >> >> Address range 000000xxxxxxxxxx: 6 MB >> Address range 000004xxxxxxxxxx: 38 MB >> Address range 000007xxxxxxxxxx: 646 MB >> Address range 000008xxxxxxxxxx: 38 MB >> Address range 00000bxxxxxxxxxx: 631 MB >> Address range 000010xxxxxxxxxx: 38 MB >> Address range 000013xxxxxxxxxx: 690 MB >> Address range 00007fxxxxxxxxxx: 726 MB >> >> It appears I have to discount the 07, 0b and 13 ranges to get to a >> reasonable actual process size of 844 MB. >> >> Question: Are these address ranges fixed or does ZGC choose >> different ones depending on heap size? Where exactly do the address ranges >> begin and end? >> >> As soon as this reporting method works, let's publish it as a standard >> tool for anyone using ZGC on Linux. I can't be the only one who's driven >> nuts by not knowing how big my processes are. >> >> Greetings, >> Stefan >> >> -- >> Stefan Reich >> BotCompany.de // Java-based operating systems >> > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From fw at deneb.enyo.de Thu Oct 3 11:24:32 2019 From: fw at deneb.enyo.de (Florian Weimer) Date: Thu, 03 Oct 2019 13:24:32 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: (Stefan Reich's message of "Thu, 3 Oct 2019 12:19:06 +0200") References: Message-ID: <87o8yy6phr.fsf@mid.deneb.enyo.de> * Stefan Reich: > I'm still trying to get Linux to report a correct process size when using > ZGC (the memory multi-mapping issue). What's your kernel version? Kernel capabilities in this area vary somewhat. From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 11:50:16 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 13:50:16 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: <87o8yy6phr.fsf@mid.deneb.enyo.de> References: <87o8yy6phr.fsf@mid.deneb.enyo.de> Message-ID: OK, here is the code: https://github.com/stefan-reich/LinuxProcessSizeDetector Can this be linked somewhere? I believe it to be useful. Seems to work on the machines I tested it on, even though there is mild guesswork involved. Kernel versions I tested: 4.4.0-51-generic 5.0.0-25-generic 4.15.0-64-generic Greetings, Stefan On Thu, 3 Oct 2019 at 13:24, Florian Weimer wrote: > * Stefan Reich: > > > I'm still trying to get Linux to report a correct process size when using > > ZGC (the memory multi-mapping issue). > > What's your kernel version? Kernel capabilities in this area vary > somewhat. > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:07:27 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:07:27 +0200 Subject: Uncommitting memory Message-ID: In which JDK version does this feature ship? Greetings, Stefan -- Stefan Reich BotCompany.de // Java-based operating systems From per.liden at oracle.com Thu Oct 3 12:16:36 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:16:36 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: Message-ID: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> Hi, On 10/3/19 12:19 PM, Stefan Reich wrote: > I'm still trying to get Linux to report a correct process size when using > ZGC (the memory multi-mapping issue). > > My idea is to parse /proc/pid/smaps. Sadly, I can't see physical addresses > there, only virtual ones. So I group by virtual address range. Here's what > I got for an example process with is reported by top as 2.7 GB: > > Address range 000000xxxxxxxxxx: 6 MB > Address range 000004xxxxxxxxxx: 38 MB > Address range 000007xxxxxxxxxx: 646 MB > Address range 000008xxxxxxxxxx: 38 MB > Address range 00000bxxxxxxxxxx: 631 MB > Address range 000010xxxxxxxxxx: 38 MB > Address range 000013xxxxxxxxxx: 690 MB > Address range 00007fxxxxxxxxxx: 726 MB > > It appears I have to discount the 07, 0b and 13 ranges to get to a > reasonable actual process size of 844 MB. > > Question: Are these address ranges fixed or does ZGC choose different ones > depending on heap size? Where exactly do the address ranges begin and end? ZGC in JDK 11 & 12 always used a fixed size and location, but that's not the case in later JDK versions, where this can vary depending on configuration and runtime conditions. > > As soon as this reporting method works, let's publish it as a standard tool > for anyone using ZGC on Linux. I can't be the only one who's driven nuts by > not knowing how big my processes are. Did you see my reply to your previous question on this topic? Tools to extract this data (PSS) exist. Are they not doing what you want? cheers, Per From per.liden at oracle.com Thu Oct 3 12:16:56 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:16:56 +0200 Subject: Uncommitting memory In-Reply-To: References: Message-ID: JDK 13. /Per On 10/3/19 2:07 PM, Stefan Reich wrote: > In which JDK version does this feature ship? > > Greetings, > Stefan > From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:18:11 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:18:11 +0200 Subject: Uncommitting memory In-Reply-To: References: Message-ID: So EA 34? On Thu, 3 Oct 2019 at 14:17, Per Liden wrote: > JDK 13. > > /Per > > On 10/3/19 2:07 PM, Stefan Reich wrote: > > In which JDK version does this feature ship? > > > > Greetings, > > Stefan > > > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:23:29 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:23:29 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> Message-ID: Hi Per! Yes, I saw, sorry for not responding the other time. This problem is, /proc/*/smaps_rollup doesn't exist on one of my machines (the one with the oldest kernel). On the newer machines, yeah, it may be an option to use PSS from smaps_rollup. Not sure if there are any tools which would help here. Greetings, Stefan On Thu, 3 Oct 2019 at 14:16, Per Liden wrote: > > Did you see my reply to your previous question on this topic? Tools to > extract this data (PSS) exist. Are they not doing what you want? > > cheers, > Per > -- Stefan Reich BotCompany.de // Java-based operating systems From per.liden at oracle.com Thu Oct 3 12:25:40 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:25:40 +0200 Subject: Uncommitting memory In-Reply-To: References: Message-ID: <3d925852-e501-2fbd-0714-b5476ce70cad@oracle.com> The JDK 13 GA is build 33. No need to use EA (Early Access) builds. Grab it here: http://jdk.java.net/13/ /Per On 10/3/19 2:18 PM, Stefan Reich wrote: > So EA 34? > > On Thu, 3 Oct 2019 at 14:17, Per Liden > wrote: > > JDK 13. > > /Per > > On 10/3/19 2:07 PM, Stefan Reich wrote: > > In which JDK version does this feature ship? > > > > Greetings, > > Stefan > > > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:28:42 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:28:42 +0200 Subject: Uncommitting memory In-Reply-To: <3d925852-e501-2fbd-0714-b5476ce70cad@oracle.com> References: <3d925852-e501-2fbd-0714-b5476ce70cad@oracle.com> Message-ID: OK, thanks On Thu, 3 Oct 2019 at 14:25, Per Liden wrote: > The JDK 13 GA is build 33. No need to use EA (Early Access) builds. > > Grab it here: http://jdk.java.net/13/ > > /Per > > On 10/3/19 2:18 PM, Stefan Reich wrote: > > So EA 34? > > > > On Thu, 3 Oct 2019 at 14:17, Per Liden > > wrote: > > > > JDK 13. > > > > /Per > > > > On 10/3/19 2:07 PM, Stefan Reich wrote: > > > In which JDK version does this feature ship? > > > > > > Greetings, > > > Stefan > > > > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From per.liden at oracle.com Thu Oct 3 12:32:03 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:32:03 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> Message-ID: <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> On 10/3/19 2:23 PM, Stefan Reich wrote: > Hi Per! > > Yes, I saw, sorry for not responding the other time. > > This problem is,?/proc/*/smaps_rollup doesn't exist on one of my > machines (the one with the oldest kernel). On the newer machines, yeah, > it may be an option to use PSS from smaps_rollup. > > Not sure if there are any tools which would help here. I know some of them (e.g. ps_mem.py), works on older kernels that doesn't have /proc//smaps_rollup. cheers, Per > > Greetings, > Stefan > > On Thu, 3 Oct 2019 at 14:16, Per Liden > wrote: > > > Did you see my reply to your previous question on this topic? Tools to > extract this data (PSS) exist. Are they not doing what you want? > > cheers, > Per > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:47:12 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:47:12 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: The situation is still confusing. My process has: Runtime.totalMemory() = 2.7 GB Runtime.usedMemory() =~ 1 GB ps_mem.py says: root at smartbot:~/bin# ps_mem.py -p 4837 Private + Shared = RAM used Program 745.5 MiB + 2.4 GiB = 3.1 GiB java --------------------------------- 3.1 GiB Is the heap counted as shared memory here? The shared memory value seems way too large. My own tool reports < 1 GB as RSS which seems way too low... On Thu, 3 Oct 2019 at 14:32, Per Liden wrote: > > On 10/3/19 2:23 PM, Stefan Reich wrote: > > Hi Per! > > > > Yes, I saw, sorry for not responding the other time. > > > > This problem is, /proc/*/smaps_rollup doesn't exist on one of my > > machines (the one with the oldest kernel). On the newer machines, yeah, > > it may be an option to use PSS from smaps_rollup. > > > > Not sure if there are any tools which would help here. > > I know some of them (e.g. ps_mem.py), works on older kernels that > doesn't have /proc//smaps_rollup. > > cheers, > Per > > > > > Greetings, > > Stefan > > > > On Thu, 3 Oct 2019 at 14:16, Per Liden > > wrote: > > > > > > Did you see my reply to your previous question on this topic? Tools > to > > extract this data (PSS) exist. Are they not doing what you want? > > > > cheers, > > Per > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 12:52:53 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 14:52:53 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: Breakdown of the process's pages by RSS: Address range 000000xxxxxxxxxx: 1 MB Address range 000004xxxxxxxxxx: 2461 MB Address range 000008xxxxxxxxxx: 2476 MB Address range 000010xxxxxxxxxx: 2662 MB Address range 00007cxxxxxxxxxx: 648 MB This is really getting confusing... On Thu, 3 Oct 2019 at 14:47, Stefan Reich < stefan.reich.maker.of.eye at googlemail.com> wrote: > The situation is still confusing. My process has: > > Runtime.totalMemory() = 2.7 GB > Runtime.usedMemory() =~ 1 GB > > ps_mem.py says: > > root at smartbot:~/bin# ps_mem.py -p 4837 > Private + Shared = RAM used Program > > 745.5 MiB + 2.4 GiB = 3.1 GiB java > --------------------------------- > 3.1 GiB > > Is the heap counted as shared memory here? The shared memory value seems > way too large. > > My own tool reports < 1 GB as RSS which seems way too low... > > > On Thu, 3 Oct 2019 at 14:32, Per Liden wrote: > >> >> On 10/3/19 2:23 PM, Stefan Reich wrote: >> > Hi Per! >> > >> > Yes, I saw, sorry for not responding the other time. >> > >> > This problem is, /proc/*/smaps_rollup doesn't exist on one of my >> > machines (the one with the oldest kernel). On the newer machines, yeah, >> > it may be an option to use PSS from smaps_rollup. >> > >> > Not sure if there are any tools which would help here. >> >> I know some of them (e.g. ps_mem.py), works on older kernels that >> doesn't have /proc//smaps_rollup. >> >> cheers, >> Per >> >> > >> > Greetings, >> > Stefan >> > >> > On Thu, 3 Oct 2019 at 14:16, Per Liden > > > wrote: >> > >> > >> > Did you see my reply to your previous question on this topic? Tools >> to >> > extract this data (PSS) exist. Are they not doing what you want? >> > >> > cheers, >> > Per >> > >> > >> > >> > -- >> > Stefan Reich >> > BotCompany.de // Java-based operating systems >> > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From per.liden at oracle.com Thu Oct 3 12:59:20 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:59:20 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: On 10/3/19 2:47 PM, Stefan Reich wrote: > The situation is still confusing. My process has: > > Runtime.totalMemory() = 2.7 GB This is the current Java heap capacity (some of it may be free/available for new allocations). > Runtime.usedMemory() =~ 1 GB There is no Runtime.usedMemory(), so I don't know where this number comes from. > > ps_mem.py says: > > root at smartbot:~/bin# ps_mem.py -p 4837 > ?Private ?+ ? Shared ?= ?RAM used Program > > 745.5 MiB + ? 2.4 GiB = ? 3.1 GiB java > --------------------------------- > ? ? ? ? ? ? ? ? ? ? ? ? ? 3.1 GiB This is the total process size, i.e. Java heap, jitted code, VM data structures, etc. > > Is the heap counted as shared memory here? The shared memory value seems > way too large. Yes, the ZGC heap is mapped as shared memory. /Per > > My own tool reports < 1 GB as RSS which seems way too low... > > > On Thu, 3 Oct 2019 at 14:32, Per Liden > wrote: > > > On 10/3/19 2:23 PM, Stefan Reich wrote: > > Hi Per! > > > > Yes, I saw, sorry for not responding the other time. > > > > This problem is,?/proc/*/smaps_rollup doesn't exist on one of my > > machines (the one with the oldest kernel). On the newer machines, > yeah, > > it may be an option to use PSS from smaps_rollup. > > > > Not sure if there are any tools which would help here. > > I know some of them (e.g. ps_mem.py), works on older kernels that > doesn't have /proc//smaps_rollup. > > cheers, > Per > > > > > Greetings, > > Stefan > > > > On Thu, 3 Oct 2019 at 14:16, Per Liden > > >> wrote: > > > > > >? ? ?Did you see my reply to your previous question on this topic? > Tools to > >? ? ?extract this data (PSS) exist. Are they not doing what you want? > > > >? ? ?cheers, > >? ? ?Per > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems From stefan.karlsson at oracle.com Thu Oct 3 13:01:01 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 3 Oct 2019 15:01:01 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: Message-ID: <9e5cb5b3-40b6-844d-d47e-80a5243a0299@oracle.com> Hi Stefan, On 2019-10-03 12:19, Stefan Reich wrote: > I'm still trying to get Linux to report a correct process size when using > ZGC (the memory multi-mapping issue). > > My idea is to parse /proc/pid/smaps. I'd like to point out that reading this file will most likely block the entire process from making progress. We've seen instances where reading this file caused the application and JVM to stand still for multiple seconds. If your applications are latency sensitive you'd probably want to avoid doing this. Cheers, StefanK Sadly, I can't see physical addresses > there, only virtual ones. So I group by virtual address range. Here's what > I got for an example process with is reported by top as 2.7 GB: > > Address range 000000xxxxxxxxxx: 6 MB > Address range 000004xxxxxxxxxx: 38 MB > Address range 000007xxxxxxxxxx: 646 MB > Address range 000008xxxxxxxxxx: 38 MB > Address range 00000bxxxxxxxxxx: 631 MB > Address range 000010xxxxxxxxxx: 38 MB > Address range 000013xxxxxxxxxx: 690 MB > Address range 00007fxxxxxxxxxx: 726 MB > > It appears I have to discount the 07, 0b and 13 ranges to get to a > reasonable actual process size of 844 MB. > > Question: Are these address ranges fixed or does ZGC choose different ones > depending on heap size? Where exactly do the address ranges begin and end? > > As soon as this reporting method works, let's publish it as a standard tool > for anyone using ZGC on Linux. I can't be the only one who's driven nuts by > not knowing how big my processes are. > > Greetings, > Stefan > From per.liden at oracle.com Thu Oct 3 13:06:27 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 15:06:27 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: On 10/3/19 2:52 PM, Stefan Reich wrote: > Breakdown of the process's pages by RSS: > > Address range 000000xxxxxxxxxx: 1 MB > Address range 000004xxxxxxxxxx: 2461 MB > Address range 000008xxxxxxxxxx: 2476 MB > Address range 000010xxxxxxxxxx: 2662 MB > Address range 00007cxxxxxxxxxx: 648 MB > > This is really getting confusing... You have three Java heap mappings (~2.5G) + non-Java heap stuff (648M), which gives you ~3.1G, which is similar to what ps_mem reports. /Per > > On Thu, 3 Oct 2019 at 14:47, Stefan Reich > > wrote: > > The situation is still confusing. My process has: > > Runtime.totalMemory() = 2.7 GB > Runtime.usedMemory() =~ 1 GB > > ps_mem.py says: > > root at smartbot:~/bin# ps_mem.py -p 4837 > ?Private ?+ ? Shared ?= ?RAM used Program > > 745.5 MiB + ? 2.4 GiB = ? 3.1 GiB java > --------------------------------- > ? ? ? ? ? ? ? ? ? ? ? ? ? 3.1 GiB > > Is the heap counted as shared memory here? The shared memory value > seems way too large. > > My own tool reports < 1 GB as RSS which seems way too low... > > > On Thu, 3 Oct 2019 at 14:32, Per Liden > wrote: > > > On 10/3/19 2:23 PM, Stefan Reich wrote: > > Hi Per! > > > > Yes, I saw, sorry for not responding the other time. > > > > This problem is,?/proc/*/smaps_rollup doesn't exist on one of my > > machines (the one with the oldest kernel). On the newer > machines, yeah, > > it may be an option to use PSS from smaps_rollup. > > > > Not sure if there are any tools which would help here. > > I know some of them (e.g. ps_mem.py), works on older kernels that > doesn't have /proc//smaps_rollup. > > cheers, > Per > > > > > Greetings, > > Stefan > > > > On Thu, 3 Oct 2019 at 14:16, Per Liden > > >> > wrote: > > > > > >? ? ?Did you see my reply to your previous question on this > topic? Tools to > >? ? ?extract this data (PSS) exist. Are they not doing what > you want? > > > >? ? ?cheers, > >? ? ?Per > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 13:07:57 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 15:07:57 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: My bad, I have a function called usedMemory() which is totalMemory()-freeMemory(). On Thu, 3 Oct 2019 at 14:59, Per Liden wrote: > On 10/3/19 2:47 PM, Stefan Reich wrote: > > The situation is still confusing. My process has: > > > > Runtime.totalMemory() = 2.7 GB > > This is the current Java heap capacity (some of it may be free/available > for new allocations). > > > Runtime.usedMemory() =~ 1 GB > > There is no Runtime.usedMemory(), so I don't know where this number > comes from. > > > > > ps_mem.py says: > > > > root at smartbot:~/bin# ps_mem.py -p 4837 > > Private + Shared = RAM used Program > > > > 745.5 MiB + 2.4 GiB = 3.1 GiB java > > --------------------------------- > > 3.1 GiB > > This is the total process size, i.e. Java heap, jitted code, VM data > structures, etc. > > > > > Is the heap counted as shared memory here? The shared memory value seems > > way too large. > > Yes, the ZGC heap is mapped as shared memory. > > /Per > > > > > My own tool reports < 1 GB as RSS which seems way too low... > > > > > > On Thu, 3 Oct 2019 at 14:32, Per Liden > > wrote: > > > > > > On 10/3/19 2:23 PM, Stefan Reich wrote: > > > Hi Per! > > > > > > Yes, I saw, sorry for not responding the other time. > > > > > > This problem is, /proc/*/smaps_rollup doesn't exist on one of my > > > machines (the one with the oldest kernel). On the newer machines, > > yeah, > > > it may be an option to use PSS from smaps_rollup. > > > > > > Not sure if there are any tools which would help here. > > > > I know some of them (e.g. ps_mem.py), works on older kernels that > > doesn't have /proc//smaps_rollup. > > > > cheers, > > Per > > > > > > > > Greetings, > > > Stefan > > > > > > On Thu, 3 Oct 2019 at 14:16, Per Liden > > > > >> > wrote: > > > > > > > > > Did you see my reply to your previous question on this topic? > > Tools to > > > extract this data (PSS) exist. Are they not doing what you > want? > > > > > > cheers, > > > Per > > > > > > > > > > > > -- > > > Stefan Reich > > > BotCompany.de // Java-based operating systems > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 13:15:23 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 15:15:23 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: So ZGC maps the heap 3 times, not 4? I realize this is not a good assumption to rely on, but I still like to know... On Thu, 3 Oct 2019 at 15:06, Per Liden wrote: > > On 10/3/19 2:52 PM, Stefan Reich wrote: > > Breakdown of the process's pages by RSS: > > > > Address range 000000xxxxxxxxxx: 1 MB > > Address range 000004xxxxxxxxxx: 2461 MB > > Address range 000008xxxxxxxxxx: 2476 MB > > Address range 000010xxxxxxxxxx: 2662 MB > > Address range 00007cxxxxxxxxxx: 648 MB > > > > This is really getting confusing... > > You have three Java heap mappings (~2.5G) + non-Java heap stuff (648M), > which gives you ~3.1G, which is similar to what ps_mem reports. > > /Per > > > > > On Thu, 3 Oct 2019 at 14:47, Stefan Reich > > > > wrote: > > > > The situation is still confusing. My process has: > > > > Runtime.totalMemory() = 2.7 GB > > Runtime.usedMemory() =~ 1 GB > > > > ps_mem.py says: > > > > root at smartbot:~/bin# ps_mem.py -p 4837 > > Private + Shared = RAM used Program > > > > 745.5 MiB + 2.4 GiB = 3.1 GiB java > > --------------------------------- > > 3.1 GiB > > > > Is the heap counted as shared memory here? The shared memory value > > seems way too large. > > > > My own tool reports < 1 GB as RSS which seems way too low... > > > > > > On Thu, 3 Oct 2019 at 14:32, Per Liden > > wrote: > > > > > > On 10/3/19 2:23 PM, Stefan Reich wrote: > > > Hi Per! > > > > > > Yes, I saw, sorry for not responding the other time. > > > > > > This problem is, /proc/*/smaps_rollup doesn't exist on one of > my > > > machines (the one with the oldest kernel). On the newer > > machines, yeah, > > > it may be an option to use PSS from smaps_rollup. > > > > > > Not sure if there are any tools which would help here. > > > > I know some of them (e.g. ps_mem.py), works on older kernels that > > doesn't have /proc//smaps_rollup. > > > > cheers, > > Per > > > > > > > > Greetings, > > > Stefan > > > > > > On Thu, 3 Oct 2019 at 14:16, Per Liden > > > > >> > > wrote: > > > > > > > > > Did you see my reply to your previous question on this > > topic? Tools to > > > extract this data (PSS) exist. Are they not doing what > > you want? > > > > > > cheers, > > > Per > > > > > > > > > > > > -- > > > Stefan Reich > > > BotCompany.de // Java-based operating systems > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 13:24:46 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 15:24:46 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: <9e5cb5b3-40b6-844d-d47e-80a5243a0299@oracle.com> References: <9e5cb5b3-40b6-844d-d47e-80a5243a0299@oracle.com> Message-ID: Hi Stefan, thanks for the hint. It hopefully won't block longer than it takes to read the file, no? It does appear to take a bit to do that, I'm seeing ~ .25 s for cat /proc/.../smaps > dev/null Greetings, Stefan On Thu, 3 Oct 2019 at 15:01, Stefan Karlsson wrote: > Hi Stefan, > > On 2019-10-03 12:19, Stefan Reich wrote: > > I'm still trying to get Linux to report a correct process size when using > > ZGC (the memory multi-mapping issue). > > > > My idea is to parse /proc/pid/smaps. > > I'd like to point out that reading this file will most likely block the > entire process from making progress. We've seen instances where reading > this file caused the application and JVM to stand still for multiple > seconds. If your applications are latency sensitive you'd probably want > to avoid doing this. > > Cheers, > StefanK > > > Sadly, I can't see physical addresses > > there, only virtual ones. So I group by virtual address range. Here's > what > > I got for an example process with is reported by top as 2.7 GB: > > > > Address range 000000xxxxxxxxxx: 6 MB > > Address range 000004xxxxxxxxxx: 38 MB > > Address range 000007xxxxxxxxxx: 646 MB > > Address range 000008xxxxxxxxxx: 38 MB > > Address range 00000bxxxxxxxxxx: 631 MB > > Address range 000010xxxxxxxxxx: 38 MB > > Address range 000013xxxxxxxxxx: 690 MB > > Address range 00007fxxxxxxxxxx: 726 MB > > > > It appears I have to discount the 07, 0b and 13 ranges to get to a > > reasonable actual process size of 844 MB. > > > > Question: Are these address ranges fixed or does ZGC choose different > ones > > depending on heap size? Where exactly do the address ranges begin and > end? > > > > As soon as this reporting method works, let's publish it as a standard > tool > > for anyone using ZGC on Linux. I can't be the only one who's driven nuts > by > > not knowing how big my processes are. > > > > Greetings, > > Stefan > > > -- Stefan Reich BotCompany.de // Java-based operating systems From per.liden at oracle.com Thu Oct 3 13:23:43 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 15:23:43 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <33e994ee-20db-cc5a-7c97-e9242622463a@oracle.com> <6ce13779-8717-05c3-bfd4-796e7c1baa90@oracle.com> Message-ID: Yes, 3 times. /Per On 10/3/19 3:15 PM, Stefan Reich wrote: > So ZGC maps the heap 3 times, not 4? > > I realize this is not a good assumption to rely on, but I still like to > know... > > On Thu, 3 Oct 2019 at 15:06, Per Liden > wrote: > > > On 10/3/19 2:52 PM, Stefan Reich wrote: > > Breakdown of the process's pages by RSS: > > > > Address range 000000xxxxxxxxxx: 1 MB > > Address range 000004xxxxxxxxxx: 2461 MB > > Address range 000008xxxxxxxxxx: 2476 MB > > Address range 000010xxxxxxxxxx: 2662 MB > > Address range 00007cxxxxxxxxxx: 648 MB > > > > This is really getting confusing... > > You have three Java heap mappings (~2.5G) + non-Java heap stuff (648M), > which gives you ~3.1G, which is similar to what ps_mem reports. > > /Per > > > > > On Thu, 3 Oct 2019 at 14:47, Stefan Reich > > > > >> wrote: > > > >? ? ?The situation is still confusing. My process has: > > > >? ? ?Runtime.totalMemory() = 2.7 GB > >? ? ?Runtime.usedMemory() =~ 1 GB > > > >? ? ?ps_mem.py says: > > > >? ? ?root at smartbot:~/bin# ps_mem.py -p 4837 > >? ? ? ?Private ?+ ? Shared ?= ?RAM used Program > > > >? ? ?745.5 MiB + ? 2.4 GiB = ? 3.1 GiB java > >? ? ?--------------------------------- > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3.1 GiB > > > >? ? ?Is the heap counted as shared memory here? The shared memory > value > >? ? ?seems way too large. > > > >? ? ?My own tool reports < 1 GB as RSS which seems way too low... > > > > > >? ? ?On Thu, 3 Oct 2019 at 14:32, Per Liden > >? ? ?>> > wrote: > > > > > >? ? ? ? ?On 10/3/19 2:23 PM, Stefan Reich wrote: > >? ? ? ? ? > Hi Per! > >? ? ? ? ? > > >? ? ? ? ? > Yes, I saw, sorry for not responding the other time. > >? ? ? ? ? > > >? ? ? ? ? > This problem is,?/proc/*/smaps_rollup doesn't exist on > one of my > >? ? ? ? ? > machines (the one with the oldest kernel). On the newer > >? ? ? ? ?machines, yeah, > >? ? ? ? ? > it may be an option to use PSS from smaps_rollup. > >? ? ? ? ? > > >? ? ? ? ? > Not sure if there are any tools which would help here. > > > >? ? ? ? ?I know some of them (e.g. ps_mem.py), works on older > kernels that > >? ? ? ? ?doesn't have /proc//smaps_rollup. > > > >? ? ? ? ?cheers, > >? ? ? ? ?Per > > > >? ? ? ? ? > > >? ? ? ? ? > Greetings, > >? ? ? ? ? > Stefan > >? ? ? ? ? > > >? ? ? ? ? > On Thu, 3 Oct 2019 at 14:16, Per Liden > > >? ? ? ? ?> > >? ? ? ? ? > >>> > >? ? ? ? ?wrote: > >? ? ? ? ? > > >? ? ? ? ? > > >? ? ? ? ? >? ? ?Did you see my reply to your previous question on this > >? ? ? ? ?topic? Tools to > >? ? ? ? ? >? ? ?extract this data (PSS) exist. Are they not doing what > >? ? ? ? ?you want? > >? ? ? ? ? > > >? ? ? ? ? >? ? ?cheers, > >? ? ? ? ? >? ? ?Per > >? ? ? ? ? > > >? ? ? ? ? > > >? ? ? ? ? > > >? ? ? ? ? > -- > >? ? ? ? ? > Stefan Reich > >? ? ? ? ? > BotCompany.de // Java-based operating systems > > > > > > > >? ? ?-- > >? ? ?Stefan Reich > >? ? ?BotCompany.de // Java-based operating systems > > > > > > > > -- > > Stefan Reich > > BotCompany.de // Java-based operating systems > > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 13:30:27 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 15:30:27 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <9e5cb5b3-40b6-844d-d47e-80a5243a0299@oracle.com> Message-ID: Not sure if there is a way to avoid /smaps at all, all the tools I've seen seem to rely one it. On Thu, 3 Oct 2019 at 15:24, Stefan Reich < stefan.reich.maker.of.eye at googlemail.com> wrote: > Hi Stefan, > > thanks for the hint. It hopefully won't block longer than it takes to read > the file, no? > > It does appear to take a bit to do that, I'm seeing ~ .25 s for cat > /proc/.../smaps > dev/null > > Greetings, > Stefan > > On Thu, 3 Oct 2019 at 15:01, Stefan Karlsson > wrote: > >> Hi Stefan, >> >> On 2019-10-03 12:19, Stefan Reich wrote: >> > I'm still trying to get Linux to report a correct process size when >> using >> > ZGC (the memory multi-mapping issue). >> > >> > My idea is to parse /proc/pid/smaps. >> >> I'd like to point out that reading this file will most likely block the >> entire process from making progress. We've seen instances where reading >> this file caused the application and JVM to stand still for multiple >> seconds. If your applications are latency sensitive you'd probably want >> to avoid doing this. >> >> Cheers, >> StefanK >> >> >> Sadly, I can't see physical addresses >> > there, only virtual ones. So I group by virtual address range. Here's >> what >> > I got for an example process with is reported by top as 2.7 GB: >> > >> > Address range 000000xxxxxxxxxx: 6 MB >> > Address range 000004xxxxxxxxxx: 38 MB >> > Address range 000007xxxxxxxxxx: 646 MB >> > Address range 000008xxxxxxxxxx: 38 MB >> > Address range 00000bxxxxxxxxxx: 631 MB >> > Address range 000010xxxxxxxxxx: 38 MB >> > Address range 000013xxxxxxxxxx: 690 MB >> > Address range 00007fxxxxxxxxxx: 726 MB >> > >> > It appears I have to discount the 07, 0b and 13 ranges to get to a >> > reasonable actual process size of 844 MB. >> > >> > Question: Are these address ranges fixed or does ZGC choose different >> ones >> > depending on heap size? Where exactly do the address ranges begin and >> end? >> > >> > As soon as this reporting method works, let's publish it as a standard >> tool >> > for anyone using ZGC on Linux. I can't be the only one who's driven >> nuts by >> > not knowing how big my processes are. >> > >> > Greetings, >> > Stefan >> > >> > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > -- Stefan Reich BotCompany.de // Java-based operating systems From fw at deneb.enyo.de Thu Oct 3 14:15:26 2019 From: fw at deneb.enyo.de (Florian Weimer) Date: Thu, 03 Oct 2019 16:15:26 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: (Stefan Reich's message of "Thu, 3 Oct 2019 13:50:16 +0200") References: <87o8yy6phr.fsf@mid.deneb.enyo.de> Message-ID: <87eezt7w5d.fsf@mid.deneb.enyo.de> * Stefan Reich: > OK, here is the code: > https://github.com/stefan-reich/LinuxProcessSizeDetector > > Can this be linked somewhere? I believe it to be useful. Seems to work on > the machines I tested it on, even though there is mild guesswork involved. With a recent-enough kernel, you will see this: $ grep ^[^A-Z] /proc/21679/smaps | grep memfd | sort -k3 40000000000-40001000000 rw-s 00000000 00:05 253242 /memfd:java_heap (deleted) 80000000000-80001000000 rw-s 00000000 00:05 253242 /memfd:java_heap (deleted) 100000000000-100001000000 rw-s 00000000 00:05 253242 /memfd:java_heap (deleted) 7fffe000000-80000000000 rw-s 01000000 00:05 253242 /memfd:java_heap (deleted) bfffe000000-c0000000000 rw-s 01000000 00:05 253242 /memfd:java_heap (deleted) 13fffe000000-140000000000 rw-s 01000000 00:05 253242 /memfd:java_heap (deleted) 40001000000-40002000000 rw-s 03000000 00:05 253242 /memfd:java_heap (deleted) 80001000000-80002000000 rw-s 03000000 00:05 253242 /memfd:java_heap (deleted) 100001000000-100002000000 rw-s 03000000 00:05 253242 /memfd:java_heap (deleted) 7fffc000000-7fffe000000 rw-s 04000000 00:05 253242 /memfd:java_heap (deleted) bfffc000000-bfffe000000 rw-s 04000000 00:05 253242 /memfd:java_heap (deleted) 13fffc000000-13fffe000000 rw-s 04000000 00:05 253242 /memfd:java_heap (deleted) 40002000000-4000ec00000 rw-s 06000000 00:05 253242 /memfd:java_heap (deleted) 80002000000-8000ec00000 rw-s 06000000 00:05 253242 /memfd:java_heap (deleted) 100002000000-10000ec00000 rw-s 06000000 00:05 253242 /memfd:java_heap (deleted) That is, you can recover the information which mapping aliases which other mapping by looking at device/inode combination (column 4 and 5) and the mapping offset (column 3). I believe there is a tool called smem which does exactly that. From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 14:20:41 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 16:20:41 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: <87eezt7w5d.fsf@mid.deneb.enyo.de> References: <87o8yy6phr.fsf@mid.deneb.enyo.de> <87eezt7w5d.fsf@mid.deneb.enyo.de> Message-ID: Yes, that sounds like a cleaner way. I do see these fields even on my oldest kernel. Let me guess, smem probably reads from /smaps too... On Thu, 3 Oct 2019 at 16:15, Florian Weimer wrote: > * Stefan Reich: > > > OK, here is the code: > > https://github.com/stefan-reich/LinuxProcessSizeDetector > > > > Can this be linked somewhere? I believe it to be useful. Seems to work on > > the machines I tested it on, even though there is mild guesswork > involved. > > With a recent-enough kernel, you will see this: > > $ grep ^[^A-Z] /proc/21679/smaps | grep memfd | sort -k3 > 40000000000-40001000000 rw-s 00000000 00:05 253242 > /memfd:java_heap (deleted) > 80000000000-80001000000 rw-s 00000000 00:05 253242 > /memfd:java_heap (deleted) > 100000000000-100001000000 rw-s 00000000 00:05 253242 > /memfd:java_heap (deleted) > 7fffe000000-80000000000 rw-s 01000000 00:05 253242 > /memfd:java_heap (deleted) > bfffe000000-c0000000000 rw-s 01000000 00:05 253242 > /memfd:java_heap (deleted) > 13fffe000000-140000000000 rw-s 01000000 00:05 253242 > /memfd:java_heap (deleted) > 40001000000-40002000000 rw-s 03000000 00:05 253242 > /memfd:java_heap (deleted) > 80001000000-80002000000 rw-s 03000000 00:05 253242 > /memfd:java_heap (deleted) > 100001000000-100002000000 rw-s 03000000 00:05 253242 > /memfd:java_heap (deleted) > 7fffc000000-7fffe000000 rw-s 04000000 00:05 253242 > /memfd:java_heap (deleted) > bfffc000000-bfffe000000 rw-s 04000000 00:05 253242 > /memfd:java_heap (deleted) > 13fffc000000-13fffe000000 rw-s 04000000 00:05 253242 > /memfd:java_heap (deleted) > 40002000000-4000ec00000 rw-s 06000000 00:05 253242 > /memfd:java_heap (deleted) > 80002000000-8000ec00000 rw-s 06000000 00:05 253242 > /memfd:java_heap (deleted) > 100002000000-10000ec00000 rw-s 06000000 00:05 253242 > /memfd:java_heap (deleted) > > That is, you can recover the information which mapping aliases which > other mapping by looking at device/inode combination (column 4 and 5) > and the mapping offset (column 3). > > I believe there is a tool called smem which does exactly that. > -- Stefan Reich BotCompany.de // Java-based operating systems From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 17:01:58 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 19:01:58 +0200 Subject: Possible working method to get actual process size on Linux In-Reply-To: References: <87o8yy6phr.fsf@mid.deneb.enyo.de> <87eezt7w5d.fsf@mid.deneb.enyo.de> Message-ID: So basically, I need to port ps_mem.py to Java. I really can't ship Python code. On Thu, Oct 3, 2019, 16:20 Stefan Reich < stefan.reich.maker.of.eye at googlemail.com> wrote: > Yes, that sounds like a cleaner way. I do see these fields even on my > oldest kernel. > > Let me guess, smem probably reads from /smaps too... > > On Thu, 3 Oct 2019 at 16:15, Florian Weimer wrote: > >> * Stefan Reich: >> >> > OK, here is the code: >> > https://github.com/stefan-reich/LinuxProcessSizeDetector >> > >> > Can this be linked somewhere? I believe it to be useful. Seems to work >> on >> > the machines I tested it on, even though there is mild guesswork >> involved. >> >> With a recent-enough kernel, you will see this: >> >> $ grep ^[^A-Z] /proc/21679/smaps | grep memfd | sort -k3 >> 40000000000-40001000000 rw-s 00000000 00:05 253242 >> /memfd:java_heap (deleted) >> 80000000000-80001000000 rw-s 00000000 00:05 253242 >> /memfd:java_heap (deleted) >> 100000000000-100001000000 rw-s 00000000 00:05 253242 >> /memfd:java_heap (deleted) >> 7fffe000000-80000000000 rw-s 01000000 00:05 253242 >> /memfd:java_heap (deleted) >> bfffe000000-c0000000000 rw-s 01000000 00:05 253242 >> /memfd:java_heap (deleted) >> 13fffe000000-140000000000 rw-s 01000000 00:05 253242 >> /memfd:java_heap (deleted) >> 40001000000-40002000000 rw-s 03000000 00:05 253242 >> /memfd:java_heap (deleted) >> 80001000000-80002000000 rw-s 03000000 00:05 253242 >> /memfd:java_heap (deleted) >> 100001000000-100002000000 rw-s 03000000 00:05 253242 >> /memfd:java_heap (deleted) >> 7fffc000000-7fffe000000 rw-s 04000000 00:05 253242 >> /memfd:java_heap (deleted) >> bfffc000000-bfffe000000 rw-s 04000000 00:05 253242 >> /memfd:java_heap (deleted) >> 13fffc000000-13fffe000000 rw-s 04000000 00:05 253242 >> /memfd:java_heap (deleted) >> 40002000000-4000ec00000 rw-s 06000000 00:05 253242 >> /memfd:java_heap (deleted) >> 80002000000-8000ec00000 rw-s 06000000 00:05 253242 >> /memfd:java_heap (deleted) >> 100002000000-10000ec00000 rw-s 06000000 00:05 253242 >> /memfd:java_heap (deleted) >> >> That is, you can recover the information which mapping aliases which >> other mapping by looking at device/inode combination (column 4 and 5) >> and the mapping offset (column 3). >> >> I believe there is a tool called smem which does exactly that. >> > > > -- > Stefan Reich > BotCompany.de // Java-based operating systems > From stefan.reich.maker.of.eye at googlemail.com Thu Oct 3 17:13:58 2019 From: stefan.reich.maker.of.eye at googlemail.com (Stefan Reich) Date: Thu, 3 Oct 2019 19:13:58 +0200 Subject: Do you say Zee GC or Zed GC? Message-ID: Just curious :D From per.liden at oracle.com Fri Oct 4 06:33:31 2019 From: per.liden at oracle.com (Per Liden) Date: Fri, 4 Oct 2019 08:33:31 +0200 Subject: Do you say Zee GC or Zed GC? In-Reply-To: References: Message-ID: I'm guessing it's 50/50, but I personally say "Zed" most of the time. /Per On 10/3/19 7:13 PM, Stefan Reich wrote: > Just curious :D > From sergeicelov at gmail.com Sat Oct 12 02:42:40 2019 From: sergeicelov at gmail.com (Sergey Tselovalnikov) Date: Sat, 12 Oct 2019 13:42:40 +1100 Subject: Back-porting of JDK-8230565 to JDK 13 Message-ID: Hi, Recently at Canva, we've upgraded a set of servers to JDK 13 and switched to ZGC. Enabling ZGC has been a great success, it significantly reduced and required absolutely no tuning. However, we started seeing SIGSEGVs after running the app for a few hours, which were very similar to the ones described in JDK-8230565, so we had to switch back to CMS. The issue seems to be fixed in JDK-8230565, as far as I can see, it's only going to arrive in JDK 14. Do you know if there are any plans on backporting JDK-8230565 to JDK 13? It seems to have resolved multiple similar issues. -- Cheers, Sergey Tselovalnikov From per.liden at oracle.com Tue Oct 15 06:28:31 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:28:31 +0200 Subject: Back-porting of JDK-8230565 to JDK 13 In-Reply-To: References: Message-ID: Hi, On 10/12/19 4:42 AM, Sergey Tselovalnikov wrote: > Hi, > > Recently at Canva, we've upgraded a set of servers to JDK 13 and switched > to ZGC. Enabling ZGC has been a great success, it significantly reduced and > required absolutely no tuning. > > However, we started seeing SIGSEGVs after running the app for a few hours, > which were very similar to the ones described in JDK-8230565, so we had to > switch back to CMS. The issue seems to be fixed in JDK-8230565, as far as I > can see, it's only going to arrive in JDK 14. > > Do you know if there are any plans on backporting JDK-8230565 to JDK 13? It > seems to have resolved multiple similar issues. > Yes, that's the plan. It's currently going though the approval process for being backported to 13.0.2. /Per From per.liden at oracle.com Tue Oct 15 17:00:34 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 19:00:34 +0200 Subject: Back-porting of JDK-8230565 to JDK 13 In-Reply-To: References: Message-ID: On 10/15/19 8:28 AM, Per Liden wrote: > Hi, > > On 10/12/19 4:42 AM, Sergey Tselovalnikov wrote: >> Hi, >> >> Recently at Canva, we've upgraded a set of servers to JDK 13 and switched >> to ZGC. Enabling ZGC has been a great success, it significantly >> reduced and >> required absolutely no tuning. >> >> However, we started seeing SIGSEGVs after running the app for a few >> hours, >> which were very similar to the ones described in JDK-8230565, so we >> had to >> switch back to CMS. The issue seems to be fixed in JDK-8230565, as far >> as I >> can see, it's only going to arrive in JDK 14. >> >> Do you know if there are any plans on backporting JDK-8230565 to JDK >> 13? It >> seems to have resolved multiple similar issues. >> > > Yes, that's the plan. It's currently going though the approval process > for being backported to 13.0.2. Just a follow up. JDK-8230565 was approved for backporting and has now been pushed to the 13.0.2 branch. /Per From sergeicelov at gmail.com Wed Oct 16 00:03:38 2019 From: sergeicelov at gmail.com (Sergey Tselovalnikov) Date: Wed, 16 Oct 2019 11:03:38 +1100 Subject: Back-porting of JDK-8230565 to JDK 13 In-Reply-To: References: Message-ID: Thank you, Per! On Wed, Oct 16, 2019 at 4:02 AM Per Liden wrote: > On 10/15/19 8:28 AM, Per Liden wrote: > > Hi, > > > > On 10/12/19 4:42 AM, Sergey Tselovalnikov wrote: > >> Hi, > >> > >> Recently at Canva, we've upgraded a set of servers to JDK 13 and > switched > >> to ZGC. Enabling ZGC has been a great success, it significantly > >> reduced and > >> required absolutely no tuning. > >> > >> However, we started seeing SIGSEGVs after running the app for a few > >> hours, > >> which were very similar to the ones described in JDK-8230565, so we > >> had to > >> switch back to CMS. The issue seems to be fixed in JDK-8230565, as far > >> as I > >> can see, it's only going to arrive in JDK 14. > >> > >> Do you know if there are any plans on backporting JDK-8230565 to JDK > >> 13? It > >> seems to have resolved multiple similar issues. > >> > > > > Yes, that's the plan. It's currently going though the approval process > > for being backported to 13.0.2. > > Just a follow up. JDK-8230565 was approved for backporting and has now > been pushed to the 13.0.2 branch. > > /Per > -- Cheers, Sergey Tselovalnikov From pme at activeviam.com Wed Oct 16 13:49:54 2019 From: pme at activeviam.com (Pierre Mevel) Date: Wed, 16 Oct 2019 15:49:54 +0200 Subject: Workers count boosting after Mark Start Message-ID: Good morning, Apologies to bother you once again. As i explained in an earlier email ( https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000736.html), I'm seeing a lot of Allocation Stalls using ZGC, the issue being that my application's allocation rate is quite high. I have GC cycles running back to back, whatever the value of ConcGcThreads. (The amount of application threads being superior to the amount of vCPUs, I even tried things like 40 ConcGcThreads on a 20 vCPUs machine, but at some point I don't want context switching to be too big of a factor). However I never see, in debug mode, the workers count boosting log lines. Looking at the code (correct me if I'm wrong), I'm seeing that only at the beginning of a cycle can the workers count be boosted to Max(ParallelGcThreads, ConcGcThreads), as it happens in the do_operation method of the VM_ZMarkStart class. For the allocation threads to be stalled at Mark Start, it should mean that Heap Memory is full at the start of the cycle. Since the cycles start preemptively with timeUntilGC, etc..., it only happens if the cycles run back to back, and heap memory is full at the start of the cycle, meaning that allocation rate was higher than the "freeing rate" during concurrent relocation. In my case, allocation stalls appear during concurrent mark. Would it be possible, in theory, for the `should_boost_worker_threads` method from zDriver to be called within `concurrent_mark_continue`, change the amount of workers and continue the ZMarkTask with more workers? Best regards and thanks in advance for your answer, Pierre M?vel pierre.mevel at activeviam.com ActiveViam 46 rue de l'arbre sec, 75001 Paris From m.sundar85 at gmail.com Thu Oct 17 05:05:43 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Wed, 16 Oct 2019 22:05:43 -0700 Subject: Allocation stalls and Page Cache Flushed messages Message-ID: Hi, We started using ZGC and seeing this pattern, Whenever there is "Allocation Stall" messages before that i see messages like this ... [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap ] Page Cache Flushed: 10M requested, 18M(18530M->18512M) flushed [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap ] Page Cache Flushed: 12M requested, 18M(17432M->17414M) flushed [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap ] Page Cache Flushed: 14M requested, 36M(16632M->16596M) flushed [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap ] Page Cache Flushed: 4M requested, 18M(7742M->7724M) flushed [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap ] Page Cache Flushed: 2M requested, 28M(2374M->2346M) flushed [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap ] Page Cache Flushed: 2M requested, 76M(2346M->2270M) flushed [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2270M->2260M) flushed [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2260M->2250M) flushed [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2250M->2240M) flushed [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2240M->2230M) flushed [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2230M->2220M) flushed [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2220M->2210M) flushed [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap ] Page Cache Flushed: 2M requested, 10M(2210M->2200M) flushed [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap ] Page Cache Flushed: 2M requested, 26M(2200M->2174M) flushed [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap ] Page Cache Flushed: 2M requested, 14M(2174M->2160M) flushed [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap ] Page Cache Flushed: 2M requested, 14M(2160M->2146M) flushed [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap ] Page Cache Flushed: 2M requested, 8M(2146M->2138M) flushed [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap ] Page Cache Flushed: 2M requested, 8M(2138M->2130M) flushed [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap ] Page Cache Flushed: 2M requested, 8M(2130M->2122M) flushed [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap ] Page Cache Flushed: 2M requested, 8M(2122M->2114M) flushed [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap ] Page Cache Flushed: 2M requested, 8M(2114M->2106M) flushed [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap ] Page Cache Flushed: 2M requested, 14M(2106M->2092M) flushed ... 1. Any idea what this info message conveys about the system? 2. Our application is allocating object 5X sometimes and Allocation Stall happens immediately after that. Can ZGC heuristic learn about this increase and later will it adjust accordingly? Other than increasing memory is there any other tuning option to consider that will help in this scenario? I am using JDK12 with 80G heap. Thanks Sundar From simone.bordet at gmail.com Thu Oct 17 15:55:53 2019 From: simone.bordet at gmail.com (Simone Bordet) Date: Thu, 17 Oct 2019 17:55:53 +0200 Subject: Remapping phase Message-ID: Hi, I would like a clarification about when the remapping phase happen. >From what I could lookup in the ZGC team presentations at conferences and from a look at the code, it seems to be done during the (next) marking phase. I understand that there is no hurry to remap, as the load barrier takes care of that, but this also means that the mutators will pay the cost of remapping. Worst case, after a GC cycle the application navigates the whole object graph and remaps the whole heap before the next marking. If remapping is done during the next marking, would not be worth to have the GC doing it just after the relocation so that the mutators won't pay the remap cost? The remapping done by the GC could use the same roots from the STW relocation start, and if something changed in the roots the load barrier will still take care of the (few) remaps that were not done. Is this a tradeoff to leave more CPU time to mutators, at the cost of the mutators trapping the load barrier to remap? Am I missing something? Thanks! -- Simone Bordet --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From m.sundar85 at gmail.com Mon Oct 21 22:14:11 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Mon, 21 Oct 2019 15:14:11 -0700 Subject: Is ZGC still in experimental? Message-ID: Hi, Any idea when ZGC will be moved out of experimental flags? Understand it is too early to move it out of experimental but do we have any plan to run it without +UnlockExperimentalVMOptions? Thanks Sundar From per.liden at oracle.com Tue Oct 22 08:12:45 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 10:12:45 +0200 Subject: Is ZGC still in experimental? In-Reply-To: References: Message-ID: Hi, No decision has been made, but we're continuously evaluating where we stand. The new C2 load barriers (JDK-8230565) was a major milestone towards making ZGC rock solid. We can hopefully make it non-experimental sooner rather than later. /Per On 10/22/19 12:14 AM, Sundara Mohan M wrote: > Hi, > Any idea when ZGC will be moved out of experimental flags? > Understand it is too early to move it out of experimental but do we have > any plan to run it without +UnlockExperimentalVMOptions? > > Thanks > Sundar > From per.liden at oracle.com Tue Oct 22 08:46:37 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 10:46:37 +0200 Subject: Remapping phase In-Reply-To: References: Message-ID: Hi, On 10/17/19 5:55 PM, Simone Bordet wrote: > Hi, > > I would like a clarification about when the remapping phase happen. > > From what I could lookup in the ZGC team presentations at conferences > and from a look at the code, it seems to be done during the (next) > marking phase. Correct. > > I understand that there is no hurry to remap, as the load barrier > takes care of that, but this also means that the mutators will pay the > cost of remapping. > Worst case, after a GC cycle the application navigates the whole > object graph and remaps the whole heap before the next marking. > > If remapping is done during the next marking, would not be worth to > have the GC doing it just after the relocation so that the mutators > won't pay the remap cost? > > The remapping done by the GC could use the same roots from the STW > relocation start, and if something changed in the roots the load > barrier will still take care of the (few) remaps that were not done. > > Is this a tradeoff to leave more CPU time to mutators, at the cost of > the mutators trapping the load barrier to remap? Walking the entire object graph is a very expensive operation, so if we can do that only once per GC cycle, we've saved a lot of CPU cycles. In most (close to all) cases, the application will not touch the entire object graph between two GCs, only a smaller subset. Having the Java thread remap a reference is not terribly expensive and will only happen once per reference, as it self-heals. We then let the next marking phase (when we have to walk the entire object graph anyway) remap the references that the Java threads didn't touch. I'm curious, since you're asking, do you have a real world workload where you suspect letting the GC immediately remap the entire graph would actually pay off? cheers, Per > > Am I missing something? > > Thanks! > From simone.bordet at gmail.com Tue Oct 22 09:00:13 2019 From: simone.bordet at gmail.com (Simone Bordet) Date: Tue, 22 Oct 2019 11:00:13 +0200 Subject: Remapping phase In-Reply-To: References: Message-ID: Hi, On Tue, Oct 22, 2019 at 10:46 AM Per Liden wrote: > I'm curious, since you're asking, do you have a real world workload > where you suspect letting the GC immediately remap the entire graph > would actually pay off? No, I was curious as just after a relocation possibly all the mutators will trigger the load barrier, multiple times to access the objects they need to work with, so they will incur in overhead but they have all CPUs available. Since the remapping would start from the roots, i.e. local variables, that are likely to be used first by the mutator, then the mutators would have less overhead because they won't trigger the load barrier as much, but have less CPUs available. I have also other sources confirming that doing an early remapping is not worth it. Thanks! -- Simone Bordet --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From per.liden at oracle.com Tue Oct 22 09:26:52 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 11:26:52 +0200 Subject: Workers count boosting after Mark Start In-Reply-To: References: Message-ID: <2a0633f9-7692-7a88-c56b-3584ede9a896@oracle.com> Hi, On 10/16/19 3:49 PM, Pierre Mevel wrote: > Good morning, > > Apologies to bother you once again. As i explained in an earlier email ( > https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000736.html), > I'm seeing a lot of Allocation Stalls using ZGC, the issue being that my > application's allocation rate is quite high. > > I have GC cycles running back to back, whatever the value of ConcGcThreads. > (The amount of application threads being superior to the amount of vCPUs, I > even tried things like 40 ConcGcThreads on a 20 vCPUs machine, but at some > point I don't want context switching to be too big of a factor). > > However I never see, in debug mode, the workers count boosting log lines. > Looking at the code (correct me if I'm wrong), I'm seeing that only at the > beginning of a cycle can the workers count be boosted to > Max(ParallelGcThreads, ConcGcThreads), as it happens in the do_operation > method of the VM_ZMarkStart class. > > For the allocation threads to be stalled at Mark Start, it should mean that > Heap Memory is full at the start of the cycle. Since the cycles start > preemptively with timeUntilGC, etc..., it only happens if the cycles run > back to back, and heap memory is full at the start of the cycle, meaning > that allocation rate was higher than the "freeing rate" during concurrent > relocation. That's correct. We only boost the number of worker threads if we've in deep trouble. If Java threads are stalled at mark start, we can just as well give more CPU to the GC to resolve the situation faster. But we also want to be careful to avoid prematurely boosting workers, thereby stealing CPU time from the application. > > In my case, allocation stalls appear during concurrent mark. > > Would it be possible, in theory, for the `should_boost_worker_threads` > method from zDriver to be called within `concurrent_mark_continue`, change > the amount of workers and continue the ZMarkTask with more workers? Changing the number of worker threads in places other than mark start is technically possible, but I'm not sure it would help a lot. concurrent_mark_continue is kind of special, and isn't called for most workloads. It's basically only there to deal with the case where the application is continuously resurrecting soft/weak references and the 1 ms marking done in mark end wasn't able to reach the end of the graph. If you have back-to-back GCs and increasing ConcGCThreads doesn't help, then your only real option is to increase the max heap size. cheers, Per > > Best regards and thanks in advance for your answer, > > Pierre M?vel > pierre.mevel at activeviam.com > ActiveViam > 46 rue de l'arbre sec, 75001 Paris > From per.liden at oracle.com Tue Oct 22 09:44:29 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 11:44:29 +0200 Subject: Allocation stalls and Page Cache Flushed messages In-Reply-To: References: Message-ID: Hi, On 10/17/19 7:05 AM, Sundara Mohan M wrote: > Hi, > We started using ZGC and seeing this pattern, Whenever there is > "Allocation Stall" messages before that i see messages like this > ... > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap ] Page Cache > Flushed: 10M requested, 18M(18530M->18512M) flushed > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap ] Page Cache > Flushed: 12M requested, 18M(17432M->17414M) flushed > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap ] Page Cache > Flushed: 14M requested, 36M(16632M->16596M) flushed > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap ] Page Cache > Flushed: 4M requested, 18M(7742M->7724M) flushed > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap ] Page Cache > Flushed: 2M requested, 28M(2374M->2346M) flushed > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap ] Page Cache > Flushed: 2M requested, 76M(2346M->2270M) flushed > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2270M->2260M) flushed > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2260M->2250M) flushed > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2250M->2240M) flushed > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2240M->2230M) flushed > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2230M->2220M) flushed > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2220M->2210M) flushed > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap ] Page Cache > Flushed: 2M requested, 10M(2210M->2200M) flushed > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap ] Page Cache > Flushed: 2M requested, 26M(2200M->2174M) flushed > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap ] Page Cache > Flushed: 2M requested, 14M(2174M->2160M) flushed > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap ] Page Cache > Flushed: 2M requested, 14M(2160M->2146M) flushed > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap ] Page Cache > Flushed: 2M requested, 8M(2146M->2138M) flushed > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap ] Page Cache > Flushed: 2M requested, 8M(2138M->2130M) flushed > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap ] Page Cache > Flushed: 2M requested, 8M(2130M->2122M) flushed > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap ] Page Cache > Flushed: 2M requested, 8M(2122M->2114M) flushed > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap ] Page Cache > Flushed: 2M requested, 8M(2114M->2106M) flushed > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap ] Page Cache > Flushed: 2M requested, 14M(2106M->2092M) flushed > ... > > 1. Any idea what this info message conveys about the system? This means that ZGC had cached heap regions (ZPages) that were ready to be used for new allocations, but they had the wrong size, so some memory (one or more ZPage) was flushed out from the cache so that the physical memory they occupied could be reused to build a new ZPage of the correct size. This is not in itself catastrophic, but it is more expensive than a regular allocation as it involves remapping memory. The only real way to avoid this is to increase the max heap size. > > 2. Our application is allocating object 5X sometimes and Allocation Stall > happens immediately after that. Can ZGC heuristic learn about this increase > and later will it adjust accordingly? Other than increasing memory is there > any other tuning option to consider that will help in this scenario? In general, increasing the heap size and/or adjusting the number of concurrent work threads (-XX:ConcGCThreads=X) are your main tuning options. Using -XX:+UseLargePages is of course also good if you want max performance. There are a few other ZGC-specific options, but those matter a whole lot less and will likely not help in this situation. The ZGC wiki has some more information on tuning: https://wiki.openjdk.java.net/display/zgc/Main cheers, Per > > I am using JDK12 with 80G heap. > > Thanks > Sundar > From m.sundar85 at gmail.com Tue Oct 22 18:04:22 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 22 Oct 2019 11:04:22 -0700 Subject: Is ZGC still in experimental? In-Reply-To: References: Message-ID: Ok, thanks for the update. On Tue, Oct 22, 2019 at 1:12 AM Per Liden wrote: > Hi, > > No decision has been made, but we're continuously evaluating where we > stand. The new C2 load barriers (JDK-8230565) was a major milestone > towards making ZGC rock solid. We can hopefully make it non-experimental > sooner rather than later. > > /Per > > On 10/22/19 12:14 AM, Sundara Mohan M wrote: > > Hi, > > Any idea when ZGC will be moved out of experimental flags? > > Understand it is too early to move it out of experimental but do we have > > any plan to run it without +UnlockExperimentalVMOptions? > > > > Thanks > > Sundar > > > From m.sundar85 at gmail.com Tue Oct 22 18:18:46 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Tue, 22 Oct 2019 11:18:46 -0700 Subject: ZGC Allocation stall metrics via MXBean Message-ID: Hi, I was trying to get GC metrics via GarbageCollectorMXBean but only see CollectionCount and CollectionTime. Even though i can get the Allocation Stall event from gc log i have to do some special setup to get that collected and reported properly. Since ZGC allocation stall is important event to identify if the application is having issue, can we expose it via any other MXBean? Thanks Sundar From m.sundar85 at gmail.com Thu Oct 24 00:49:45 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Wed, 23 Oct 2019 17:49:45 -0700 Subject: ZGC cpu metrics in logs Message-ID: Hi, Wondering why zgc doesn't print cpu metrics like below [2019-10-23T17:42:48.539+0800][0.959s][info][gc,cpu ] GC(118) User=0.00s Sys=0.00s Real=0.00s For any other garbage collectors it is printed. Is this avoid because ZGC stop the world pause is always in milliseconds? Just curious why this was not printed for ZGC alone. Thanks Sundar From conniall at amazon.com Thu Oct 24 04:08:40 2019 From: conniall at amazon.com (Connaughton, Niall) Date: Thu, 24 Oct 2019 04:08:40 +0000 Subject: ZGC Allocation stall metrics via MXBean In-Reply-To: References: Message-ID: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> I was going to ask the same question. In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB? Thanks, Niall ?On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" wrote: Hi, I was trying to get GC metrics via GarbageCollectorMXBean but only see CollectionCount and CollectionTime. Even though i can get the Allocation Stall event from gc log i have to do some special setup to get that collected and reported properly. Since ZGC allocation stall is important event to identify if the application is having issue, can we expose it via any other MXBean? Thanks Sundar From per.liden at oracle.com Thu Oct 24 07:46:19 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 24 Oct 2019 09:46:19 +0200 Subject: ZGC Allocation stall metrics via MXBean In-Reply-To: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> References: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> Message-ID: <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> Hi, When allocating a small object (object size <= 256K), if the thread already has a TLAB it will continue to allocate from it without being stalled. If the TLAB is exhausted, the thread will try to allocate a new TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for allocating TLABS), again without being stalled. Only if that CPU-local ZPage is also exhausted will the thread try to allocate a new ZPage, in which case it will be stalled if we're currently out of memory. The allocation path is slightly different when allocating medium objects (object size <= 4M). In this cases, the first attempt is to allocate the object into a global/shared medium ZPage. If that page is exhausted, it will try to allocate a new medium ZPage, and is subject to allocation stall if we're out of memory. For large objects (object size > 4M), we always allocate a new large ZPage, so we'll have an allocation stall if we're out of memory. In summary, if we're out of memory, a thread might still be able to allocate obejcts without being stalled. If circumstances are right. Exposing allocation stall information via an MXBean might be useful. We certainly have the information, so it's mostly a question about if and how we want to expose it. Just thinking out loud, one could imagine adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean, or maybe even introduce a c.s.m.ZGarbageCollectorMXBean. cheers, Per On 10/24/19 6:08 AM, Connaughton, Niall wrote: > I was going to ask the same question. > > In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB? > > Thanks, > Niall > > ?On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" wrote: > > Hi, > I was trying to get GC metrics via GarbageCollectorMXBean but only see > CollectionCount and CollectionTime. > Even though i can get the Allocation Stall event from gc log i have to do > some special setup to get that collected and reported properly. > Since ZGC allocation stall is important event to identify if the > application is having issue, can we expose it via any other MXBean? > > Thanks > Sundar > > From per.liden at oracle.com Thu Oct 24 07:57:10 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 24 Oct 2019 09:57:10 +0200 Subject: ZGC cpu metrics in logs In-Reply-To: References: Message-ID: <712e16d3-7e8e-2767-2765-5680f3cc0f25@oracle.com> Hi, On 10/24/19 2:49 AM, Sundara Mohan M wrote: > Hi, > Wondering why zgc doesn't print cpu metrics like below > > [2019-10-23T17:42:48.539+0800][0.959s][info][gc,cpu ] GC(118) > User=0.00s Sys=0.00s Real=0.00s > > For any other garbage collectors it is printed. > Is this avoid because ZGC stop the world pause is always in milliseconds? > > Just curious why this was not printed for ZGC alone. Right, printing that exact line wouldn't be that useful, as ZGC pauses wouldn't normally register on that scale. However, printing more detailed timing information could certainly be useful. We've especially been interested in real-time vs. cpu-time, as that can often explain strange outliers, expose over provisioning, etc. cheers, Per > > Thanks > Sundar > From stefan.karlsson at oracle.com Thu Oct 24 12:40:22 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 24 Oct 2019 14:40:22 +0200 Subject: Allocation stalls and Page Cache Flushed messages In-Reply-To: References: Message-ID: Hi, On 2019-10-22 11:44, Per Liden wrote: > Hi, > > On 10/17/19 7:05 AM, Sundara Mohan M wrote: >> Hi, >> ??? We started using ZGC and seeing this pattern, Whenever there is >> "Allocation Stall" messages before that i see messages like this >> ... >> [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap???? ] Page Cache >> Flushed: 10M requested, 18M(18530M->18512M) flushed >> [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap???? ] Page Cache >> Flushed: 12M requested, 18M(17432M->17414M) flushed >> [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap???? ] Page Cache >> Flushed: 14M requested, 36M(16632M->16596M) flushed >> [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap???? ] Page Cache >> Flushed: 4M requested, 18M(7742M->7724M) flushed >> [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 28M(2374M->2346M) flushed >> [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 76M(2346M->2270M) flushed >> [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2270M->2260M) flushed >> [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2260M->2250M) flushed >> [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2250M->2240M) flushed >> [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2240M->2230M) flushed >> [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2230M->2220M) flushed >> [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2220M->2210M) flushed >> [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 10M(2210M->2200M) flushed >> [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 26M(2200M->2174M) flushed >> [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 14M(2174M->2160M) flushed >> [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 14M(2160M->2146M) flushed >> [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 8M(2146M->2138M) flushed >> [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 8M(2138M->2130M) flushed >> [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 8M(2130M->2122M) flushed >> [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 8M(2122M->2114M) flushed >> [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 8M(2114M->2106M) flushed >> [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap???? ] Page Cache >> Flushed: 2M requested, 14M(2106M->2092M) flushed >> ... >> >> 1. Any idea what this info message conveys about the system? > > This means that ZGC had cached heap regions (ZPages) that were ready to > be used for new allocations, but they had the wrong size, so some memory > (one or more ZPage) was flushed out from the cache so that the physical > memory they occupied could be reused to build a new ZPage of the correct > size. This is not in itself catastrophic, but it is more expensive than > a regular allocation as it involves remapping memory. The only real way > to avoid this is to increase the max heap size. One more comment about this. In JDK 13 we changed how we flush pages. Now when we flush a larger page, to build a smaller page, we don't perform the expensive unmap and remap of the memory. We simply shrink the old page, create a new ZPage, and keep the virtual to physical mapping. So, the lines stating that we flushed a small pages, 'Flushed: 2M', indicates a more expensive operation in JDK 12 compared to JDK 13. Cheers, StefanK From conniall at amazon.com Thu Oct 24 17:34:26 2019 From: conniall at amazon.com (Connaughton, Niall) Date: Thu, 24 Oct 2019 17:34:26 +0000 Subject: ZGC Allocation stall metrics via MXBean In-Reply-To: <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> References: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> Message-ID: <48426E7F-3A43-4786-A70A-6E93E2AF75F2@amazon.com> Thanks Per, that's helpful to understand. On exposing allocation stall information via an MXBean, it would be super nice if it was exposed via a bean that implements NotificationEmitter. We're currently using notifications from the GarbageCollectorMXBean to subscribe to GC events and record data on pause duration, cause, etc, as well as what in-flight operations may have been impacted by the pause. If we could use a similar approach to watch for allocation stalls, instead of polling for stalls via the ThreadMXBean, that would be awesome. ?On 10/24/19, 00:47, "Per Liden" wrote: Hi, When allocating a small object (object size <= 256K), if the thread already has a TLAB it will continue to allocate from it without being stalled. If the TLAB is exhausted, the thread will try to allocate a new TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for allocating TLABS), again without being stalled. Only if that CPU-local ZPage is also exhausted will the thread try to allocate a new ZPage, in which case it will be stalled if we're currently out of memory. The allocation path is slightly different when allocating medium objects (object size <= 4M). In this cases, the first attempt is to allocate the object into a global/shared medium ZPage. If that page is exhausted, it will try to allocate a new medium ZPage, and is subject to allocation stall if we're out of memory. For large objects (object size > 4M), we always allocate a new large ZPage, so we'll have an allocation stall if we're out of memory. In summary, if we're out of memory, a thread might still be able to allocate obejcts without being stalled. If circumstances are right. Exposing allocation stall information via an MXBean might be useful. We certainly have the information, so it's mostly a question about if and how we want to expose it. Just thinking out loud, one could imagine adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean, or maybe even introduce a c.s.m.ZGarbageCollectorMXBean. cheers, Per On 10/24/19 6:08 AM, Connaughton, Niall wrote: > I was going to ask the same question. > > In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB? > > Thanks, > Niall > > On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" wrote: > > Hi, > I was trying to get GC metrics via GarbageCollectorMXBean but only see > CollectionCount and CollectionTime. > Even though i can get the Allocation Stall event from gc log i have to do > some special setup to get that collected and reported properly. > Since ZGC allocation stall is important event to identify if the > application is having issue, can we expose it via any other MXBean? > > Thanks > Sundar > > From m.sundar85 at gmail.com Thu Oct 24 17:36:05 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Thu, 24 Oct 2019 10:36:05 -0700 Subject: Allocation stalls and Page Cache Flushed messages In-Reply-To: References: Message-ID: HI Per, > This means that ZGC had cached heap regions (ZPages) that were ready to > be used for new allocations, but they had the wrong size, so some memory > (one or more ZPage) was flushed out from the cache so that the physical > memory they occupied could be reused to build a new ZPage of the correct > size. "they had wrong size" does that mean we didn't have a cached page with our new required page size? For ex. We cached few 2M page and now we need 4M page so it will flush multiple pages in cache to get the new page with 4M? Thanks Sundar On Tue, Oct 22, 2019 at 2:44 AM Per Liden wrote: > Hi, > > On 10/17/19 7:05 AM, Sundara Mohan M wrote: > > Hi, > > We started using ZGC and seeing this pattern, Whenever there is > > "Allocation Stall" messages before that i see messages like this > > ... > > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap ] Page Cache > > Flushed: 10M requested, 18M(18530M->18512M) flushed > > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap ] Page Cache > > Flushed: 12M requested, 18M(17432M->17414M) flushed > > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap ] Page Cache > > Flushed: 14M requested, 36M(16632M->16596M) flushed > > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap ] Page Cache > > Flushed: 4M requested, 18M(7742M->7724M) flushed > > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 28M(2374M->2346M) flushed > > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 76M(2346M->2270M) flushed > > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2270M->2260M) flushed > > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2260M->2250M) flushed > > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2250M->2240M) flushed > > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2240M->2230M) flushed > > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2230M->2220M) flushed > > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2220M->2210M) flushed > > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 10M(2210M->2200M) flushed > > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 26M(2200M->2174M) flushed > > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 14M(2174M->2160M) flushed > > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 14M(2160M->2146M) flushed > > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 8M(2146M->2138M) flushed > > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 8M(2138M->2130M) flushed > > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 8M(2130M->2122M) flushed > > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 8M(2122M->2114M) flushed > > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 8M(2114M->2106M) flushed > > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap ] Page Cache > > Flushed: 2M requested, 14M(2106M->2092M) flushed > > ... > > > > 1. Any idea what this info message conveys about the system? > > This means that ZGC had cached heap regions (ZPages) that were ready to > be used for new allocations, but they had the wrong size, so some memory > (one or more ZPage) was flushed out from the cache so that the physical > memory they occupied could be reused to build a new ZPage of the correct > size. This is not in itself catastrophic, but it is more expensive than > a regular allocation as it involves remapping memory. The only real way > to avoid this is to increase the max heap size. > > > > > 2. Our application is allocating object 5X sometimes and Allocation Stall > > happens immediately after that. Can ZGC heuristic learn about this > increase > > and later will it adjust accordingly? Other than increasing memory is > there > > any other tuning option to consider that will help in this scenario? > > In general, increasing the heap size and/or adjusting the number of > concurrent work threads (-XX:ConcGCThreads=X) are your main tuning > options. Using -XX:+UseLargePages is of course also good if you want max > performance. There are a few other ZGC-specific options, but those > matter a whole lot less and will likely not help in this situation. > > The ZGC wiki has some more information on tuning: > https://wiki.openjdk.java.net/display/zgc/Main > > cheers, > Per > > > > > I am using JDK12 with 80G heap. > > > > Thanks > > Sundar > > > From m.sundar85 at gmail.com Thu Oct 24 18:00:05 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Thu, 24 Oct 2019 11:00:05 -0700 Subject: ZGC Allocation stall metrics via MXBean In-Reply-To: <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> References: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> Message-ID: Thanks Per, for the nice explanation. I would prefer to have just the AllocationStall count in c.s.m.GarbageCollectorMXBean or c.s.m.ZGarbageCollectorMXBean and stall time related metrics in c.s.m.ThreadMXBean (assuming this will have metrics for each thread). Also i would choose c.s.m.ZGarbageCollectorMXBean so that we don't have to deal with backward compatibility issues(like people using other GC and seeing ZGC metrics or the other way) later. Again assuming java development is moving toward that with multiple feature releases in a year model. That's just my two cents. Thanks Sundar On Thu, Oct 24, 2019 at 12:46 AM Per Liden wrote: > Hi, > > When allocating a small object (object size <= 256K), if the thread > already has a TLAB it will continue to allocate from it without being > stalled. If the TLAB is exhausted, the thread will try to allocate a new > TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for > allocating TLABS), again without being stalled. Only if that CPU-local > ZPage is also exhausted will the thread try to allocate a new ZPage, in > which case it will be stalled if we're currently out of memory. > > The allocation path is slightly different when allocating medium objects > (object size <= 4M). In this cases, the first attempt is to allocate the > object into a global/shared medium ZPage. If that page is exhausted, it > will try to allocate a new medium ZPage, and is subject to allocation > stall if we're out of memory. > > For large objects (object size > 4M), we always allocate a new large > ZPage, so we'll have an allocation stall if we're out of memory. > > In summary, if we're out of memory, a thread might still be able to > allocate obejcts without being stalled. If circumstances are right. > > Exposing allocation stall information via an MXBean might be useful. We > certainly have the information, so it's mostly a question about if and > how we want to expose it. Just thinking out loud, one could imagine > adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean, > or maybe even introduce a c.s.m.ZGarbageCollectorMXBean. > > cheers, > Per > > On 10/24/19 6:08 AM, Connaughton, Niall wrote: > > I was going to ask the same question. > > > > In addition - is there any documentation on how the allocation stalls > work? I'm looking to understand things like whether the stall happens to > any thread that attempts to allocate a new object, or only threads that > need a new TLAB, or some other mechanism. Put another way - if we do > something like jHiccup and have a thread constantly sleeping and allocating > a small amount, would it detect allocation stalls? Or would it not be > stalled until it exhausts its TLAB? > > > > Thanks, > > Niall > > > > ?On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" < > zgc-dev-bounces at openjdk.java.net on behalf of m.sundar85 at gmail.com> wrote: > > > > Hi, > > I was trying to get GC metrics via GarbageCollectorMXBean but > only see > > CollectionCount and CollectionTime. > > Even though i can get the Allocation Stall event from gc log i have > to do > > some special setup to get that collected and reported properly. > > Since ZGC allocation stall is important event to identify if the > > application is having issue, can we expose it via any other MXBean? > > > > Thanks > > Sundar > > > > > From per.liden at oracle.com Fri Oct 25 12:11:09 2019 From: per.liden at oracle.com (Per Liden) Date: Fri, 25 Oct 2019 14:11:09 +0200 Subject: Allocation stalls and Page Cache Flushed messages In-Reply-To: References: Message-ID: On 10/24/19 7:36 PM, Sundara Mohan M wrote: > HI Per, > > > This means that ZGC had cached heap regions (ZPages) that were ready to > > be used for new allocations, but they had the wrong size, so some memory > > (one or more ZPage) was flushed out from the cache so that the physical > > memory they occupied could be reused to build a new ZPage of the correct > > size. > > "they had wrong size" does that mean we didn't have a cached page with > our new required page size? > For ex. We cached few 2M page and now we need 4M page so it will flush > multiple pages in cache to get the new page with 4M? Right, it in this case it would flush two 2M pages, and re-map them as a single contiguous 4M page. /Per > > > Thanks > Sundar > > On Tue, Oct 22, 2019 at 2:44 AM Per Liden > wrote: > > Hi, > > On 10/17/19 7:05 AM, Sundara Mohan M wrote: > > Hi, > >? ? ?We started using ZGC and seeing this pattern, Whenever there is > > "Allocation Stall" messages before that i see messages like this > > ... > > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap? ? ?] > Page Cache > > Flushed: 10M requested, 18M(18530M->18512M) flushed > > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap? ? ?] > Page Cache > > Flushed: 12M requested, 18M(17432M->17414M) flushed > > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap? ? ?] > Page Cache > > Flushed: 14M requested, 36M(16632M->16596M) flushed > > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap? ? ?] > Page Cache > > Flushed: 4M requested, 18M(7742M->7724M) flushed > > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 28M(2374M->2346M) flushed > > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 76M(2346M->2270M) flushed > > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2270M->2260M) flushed > > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2260M->2250M) flushed > > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2250M->2240M) flushed > > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2240M->2230M) flushed > > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2230M->2220M) flushed > > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2220M->2210M) flushed > > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 10M(2210M->2200M) flushed > > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 26M(2200M->2174M) flushed > > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 14M(2174M->2160M) flushed > > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 14M(2160M->2146M) flushed > > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 8M(2146M->2138M) flushed > > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 8M(2138M->2130M) flushed > > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 8M(2130M->2122M) flushed > > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 8M(2122M->2114M) flushed > > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 8M(2114M->2106M) flushed > > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap? ? ?] > Page Cache > > Flushed: 2M requested, 14M(2106M->2092M) flushed > > ... > > > > 1. Any idea what this info message conveys about the system? > > This means that ZGC had cached heap regions (ZPages) that were ready to > be used for new allocations, but they had the wrong size, so some > memory > (one or more ZPage) was flushed out from the cache so that the physical > memory they occupied could be reused to build a new ZPage of the > correct > size. This is not in itself catastrophic, but it is more expensive than > a regular allocation as it involves remapping memory. The only real way > to avoid this is to increase the max heap size. > > > > > 2. Our application is allocating object 5X sometimes and > Allocation Stall > > happens immediately after that. Can ZGC heuristic learn about > this increase > > and later will it adjust accordingly? Other than increasing > memory is there > > any other tuning option to consider that will help in this scenario? > > In general, increasing the heap size and/or adjusting the number of > concurrent work threads (-XX:ConcGCThreads=X) are your main tuning > options. Using -XX:+UseLargePages is of course also good if you want > max > performance. There are a few other ZGC-specific options, but those > matter a whole lot less and will likely not help in this situation. > > The ZGC wiki has some more information on tuning: > https://wiki.openjdk.java.net/display/zgc/Main > > cheers, > Per > > > > > I am using JDK12 with 80G heap. > > > > Thanks > > Sundar > > > From per.liden at oracle.com Fri Oct 25 12:14:20 2019 From: per.liden at oracle.com (Per Liden) Date: Fri, 25 Oct 2019 14:14:20 +0200 Subject: ZGC Allocation stall metrics via MXBean In-Reply-To: <48426E7F-3A43-4786-A70A-6E93E2AF75F2@amazon.com> References: <80A6B181-069D-4964-85AD-334CD9A1722E@amazon.com> <0c4171d8-59c0-c897-d873-d2e2f1357ff0@oracle.com> <48426E7F-3A43-4786-A70A-6E93E2AF75F2@amazon.com> Message-ID: <87380b60-487c-b247-2278-f18775953824@oracle.com> On 10/24/19 7:34 PM, Connaughton, Niall wrote: > Thanks Per, that's helpful to understand. > > On exposing allocation stall information via an MXBean, it would be super nice if it was exposed via a bean that implements NotificationEmitter. We're currently using notifications from the GarbageCollectorMXBean to subscribe to GC events and record data on pause duration, cause, etc, as well as what in-flight operations may have been impacted by the pause. If we could use a similar approach to watch for allocation stalls, instead of polling for stalls via the ThreadMXBean, that would be awesome. Good to know, thanks. /Per > > ?On 10/24/19, 00:47, "Per Liden" wrote: > > Hi, > > When allocating a small object (object size <= 256K), if the thread > already has a TLAB it will continue to allocate from it without being > stalled. If the TLAB is exhausted, the thread will try to allocate a new > TLAB from a CPU-local ZPage (this can be seen as a "CPU-LAB" for > allocating TLABS), again without being stalled. Only if that CPU-local > ZPage is also exhausted will the thread try to allocate a new ZPage, in > which case it will be stalled if we're currently out of memory. > > The allocation path is slightly different when allocating medium objects > (object size <= 4M). In this cases, the first attempt is to allocate the > object into a global/shared medium ZPage. If that page is exhausted, it > will try to allocate a new medium ZPage, and is subject to allocation > stall if we're out of memory. > > For large objects (object size > 4M), we always allocate a new large > ZPage, so we'll have an allocation stall if we're out of memory. > > In summary, if we're out of memory, a thread might still be able to > allocate obejcts without being stalled. If circumstances are right. > > Exposing allocation stall information via an MXBean might be useful. We > certainly have the information, so it's mostly a question about if and > how we want to expose it. Just thinking out loud, one could imagine > adding something to c.s.m.GarbageCollectorMXBean or c.s.m.ThreadMXBean, > or maybe even introduce a c.s.m.ZGarbageCollectorMXBean. > > cheers, > Per > > On 10/24/19 6:08 AM, Connaughton, Niall wrote: > > I was going to ask the same question. > > > > In addition - is there any documentation on how the allocation stalls work? I'm looking to understand things like whether the stall happens to any thread that attempts to allocate a new object, or only threads that need a new TLAB, or some other mechanism. Put another way - if we do something like jHiccup and have a thread constantly sleeping and allocating a small amount, would it detect allocation stalls? Or would it not be stalled until it exhausts its TLAB? > > > > Thanks, > > Niall > > > > On 10/22/19, 11:19, "zgc-dev on behalf of Sundara Mohan M" wrote: > > > > Hi, > > I was trying to get GC metrics via GarbageCollectorMXBean but only see > > CollectionCount and CollectionTime. > > Even though i can get the Allocation Stall event from gc log i have to do > > some special setup to get that collected and reported properly. > > Since ZGC allocation stall is important event to identify if the > > application is having issue, can we expose it via any other MXBean? > > > > Thanks > > Sundar > > > > > > From m.sundar85 at gmail.com Fri Oct 25 22:20:22 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Fri, 25 Oct 2019 15:20:22 -0700 Subject: Allocation stalls and Page Cache Flushed messages In-Reply-To: References: Message-ID: Got it. Thanks Sundar On Fri, Oct 25, 2019 at 5:11 AM Per Liden wrote: > On 10/24/19 7:36 PM, Sundara Mohan M wrote: > > HI Per, > > > > > This means that ZGC had cached heap regions (ZPages) that were ready > to > > > be used for new allocations, but they had the wrong size, so some > memory > > > (one or more ZPage) was flushed out from the cache so that the > physical > > > memory they occupied could be reused to build a new ZPage of the > correct > > > size. > > > > "they had wrong size" does that mean we didn't have a cached page with > > our new required page size? > > For ex. We cached few 2M page and now we need 4M page so it will flush > > multiple pages in cache to get the new page with 4M? > > Right, it in this case it would flush two 2M pages, and re-map them as a > single contiguous 4M page. > > /Per > > > > > > > Thanks > > Sundar > > > > On Tue, Oct 22, 2019 at 2:44 AM Per Liden > > wrote: > > > > Hi, > > > > On 10/17/19 7:05 AM, Sundara Mohan M wrote: > > > Hi, > > > We started using ZGC and seeing this pattern, Whenever there > is > > > "Allocation Stall" messages before that i see messages like this > > > ... > > > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap ] > > Page Cache > > > Flushed: 10M requested, 18M(18530M->18512M) flushed > > > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap ] > > Page Cache > > > Flushed: 12M requested, 18M(17432M->17414M) flushed > > > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap ] > > Page Cache > > > Flushed: 14M requested, 36M(16632M->16596M) flushed > > > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap ] > > Page Cache > > > Flushed: 4M requested, 18M(7742M->7724M) flushed > > > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 28M(2374M->2346M) flushed > > > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 76M(2346M->2270M) flushed > > > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2270M->2260M) flushed > > > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2260M->2250M) flushed > > > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2250M->2240M) flushed > > > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2240M->2230M) flushed > > > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2230M->2220M) flushed > > > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2220M->2210M) flushed > > > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 10M(2210M->2200M) flushed > > > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 26M(2200M->2174M) flushed > > > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 14M(2174M->2160M) flushed > > > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 14M(2160M->2146M) flushed > > > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 8M(2146M->2138M) flushed > > > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 8M(2138M->2130M) flushed > > > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 8M(2130M->2122M) flushed > > > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 8M(2122M->2114M) flushed > > > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 8M(2114M->2106M) flushed > > > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap ] > > Page Cache > > > Flushed: 2M requested, 14M(2106M->2092M) flushed > > > ... > > > > > > 1. Any idea what this info message conveys about the system? > > > > This means that ZGC had cached heap regions (ZPages) that were ready > to > > be used for new allocations, but they had the wrong size, so some > > memory > > (one or more ZPage) was flushed out from the cache so that the > physical > > memory they occupied could be reused to build a new ZPage of the > > correct > > size. This is not in itself catastrophic, but it is more expensive > than > > a regular allocation as it involves remapping memory. The only real > way > > to avoid this is to increase the max heap size. > > > > > > > > 2. Our application is allocating object 5X sometimes and > > Allocation Stall > > > happens immediately after that. Can ZGC heuristic learn about > > this increase > > > and later will it adjust accordingly? Other than increasing > > memory is there > > > any other tuning option to consider that will help in this > scenario? > > > > In general, increasing the heap size and/or adjusting the number of > > concurrent work threads (-XX:ConcGCThreads=X) are your main tuning > > options. Using -XX:+UseLargePages is of course also good if you want > > max > > performance. There are a few other ZGC-specific options, but those > > matter a whole lot less and will likely not help in this situation. > > > > The ZGC wiki has some more information on tuning: > > https://wiki.openjdk.java.net/display/zgc/Main > > > > cheers, > > Per > > > > > > > > I am using JDK12 with 80G heap. > > > > > > Thanks > > > Sundar > > > > > > From m.sundar85 at gmail.com Fri Oct 25 22:20:53 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Fri, 25 Oct 2019 15:20:53 -0700 Subject: ZGC cpu metrics in logs In-Reply-To: <712e16d3-7e8e-2767-2765-5680f3cc0f25@oracle.com> References: <712e16d3-7e8e-2767-2765-5680f3cc0f25@oracle.com> Message-ID: Ok, thanks for the clarification. On Thu, Oct 24, 2019 at 12:57 AM Per Liden wrote: > Hi, > > On 10/24/19 2:49 AM, Sundara Mohan M wrote: > > Hi, > > Wondering why zgc doesn't print cpu metrics like below > > > > [2019-10-23T17:42:48.539+0800][0.959s][info][gc,cpu ] GC(118) > > User=0.00s Sys=0.00s Real=0.00s > > > > For any other garbage collectors it is printed. > > Is this avoid because ZGC stop the world pause is always in milliseconds? > > > > Just curious why this was not printed for ZGC alone. > > Right, printing that exact line wouldn't be that useful, as ZGC pauses > wouldn't normally register on that scale. > > However, printing more detailed timing information could certainly be > useful. We've especially been interested in real-time vs. cpu-time, as > that can often explain strange outliers, expose over provisioning, etc. > > cheers, > Per > > > > > Thanks > > Sundar > > > From m.sundar85 at gmail.com Wed Oct 30 19:28:16 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Wed, 30 Oct 2019 12:28:16 -0700 Subject: Is ZGC still in experimental? In-Reply-To: References: Message-ID: Hi Per Will these changes be merged back to JDK11 at any point? For ex. uncommit memory feature or C2 related changes will be merged back to 11? Thanks Sundar On Tue, Oct 22, 2019 at 11:04 AM Sundara Mohan M wrote: > Ok, thanks for the update. > > On Tue, Oct 22, 2019 at 1:12 AM Per Liden wrote: > >> Hi, >> >> No decision has been made, but we're continuously evaluating where we >> stand. The new C2 load barriers (JDK-8230565) was a major milestone >> towards making ZGC rock solid. We can hopefully make it non-experimental >> sooner rather than later. >> >> /Per >> >> On 10/22/19 12:14 AM, Sundara Mohan M wrote: >> > Hi, >> > Any idea when ZGC will be moved out of experimental flags? >> > Understand it is too early to move it out of experimental but do we have >> > any plan to run it without +UnlockExperimentalVMOptions? >> > >> > Thanks >> > Sundar >> > >> > From per.liden at oracle.com Thu Oct 31 08:58:56 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 31 Oct 2019 09:58:56 +0100 Subject: Is ZGC still in experimental? In-Reply-To: References: Message-ID: <7edbce9a-d89c-a16d-20a9-a20c48d51e5b@oracle.com> I would way that's unlikely at this time, given ZGC's experimental status in 11. /Per On 10/30/19 8:28 PM, Sundara Mohan M wrote: > Hi?Per > ? Will these changes be merged back to JDK11 at any point? > For ex. uncommit memory feature or C2 related changes will be merged > back to 11? > > Thanks > Sundar > > > On Tue, Oct 22, 2019 at 11:04 AM Sundara Mohan M > wrote: > > Ok, thanks for the update. > > On Tue, Oct 22, 2019 at 1:12 AM Per Liden > wrote: > > Hi, > > No decision has been made, but we're continuously evaluating > where we > stand. The new C2 load barriers (JDK-8230565) was a major milestone > towards making ZGC rock solid. We can hopefully make it > non-experimental > sooner rather than later. > > /Per > > On 10/22/19 12:14 AM, Sundara Mohan M wrote: > > Hi, > >? ? ?Any idea when ZGC will be moved out of experimental flags? > > Understand it is too early to move it out of experimental but > do we have > > any plan to run it without +UnlockExperimentalVMOptions? > > > > Thanks > > Sundar > > > From m.sundar85 at gmail.com Thu Oct 31 22:39:55 2019 From: m.sundar85 at gmail.com (Sundara Mohan M) Date: Thu, 31 Oct 2019 15:39:55 -0700 Subject: Is ZGC still in experimental? In-Reply-To: <7edbce9a-d89c-a16d-20a9-a20c48d51e5b@oracle.com> References: <7edbce9a-d89c-a16d-20a9-a20c48d51e5b@oracle.com> Message-ID: Ok, thanks for the update. On Thu, Oct 31, 2019 at 1:59 AM Per Liden wrote: > I would way that's unlikely at this time, given ZGC's experimental > status in 11. > > /Per > > On 10/30/19 8:28 PM, Sundara Mohan M wrote: > > Hi Per > > Will these changes be merged back to JDK11 at any point? > > For ex. uncommit memory feature or C2 related changes will be merged > > back to 11? > > > > Thanks > > Sundar > > > > > > On Tue, Oct 22, 2019 at 11:04 AM Sundara Mohan M > > wrote: > > > > Ok, thanks for the update. > > > > On Tue, Oct 22, 2019 at 1:12 AM Per Liden > > wrote: > > > > Hi, > > > > No decision has been made, but we're continuously evaluating > > where we > > stand. The new C2 load barriers (JDK-8230565) was a major > milestone > > towards making ZGC rock solid. We can hopefully make it > > non-experimental > > sooner rather than later. > > > > /Per > > > > On 10/22/19 12:14 AM, Sundara Mohan M wrote: > > > Hi, > > > Any idea when ZGC will be moved out of experimental flags? > > > Understand it is too early to move it out of experimental but > > do we have > > > any plan to run it without +UnlockExperimentalVMOptions? > > > > > > Thanks > > > Sundar > > > > > >