From simone.bordet at gmail.com Tue Dec 9 13:40:47 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Tue, 9 Dec 2014 14:40:47 +0100 Subject: G1 eden resizing behaviour ? Message-ID: Hi, I observed the following behaviour in G1 and I would like some feedback with respect to whether it is right/expected/known or not. I have an application that allocates about 500-700 MiB/s, with a promotion rate of about 5-7 MiB/s on a 32 GiB heap. >From time to time, G1 performs a marking and then mixed GCs. During the young GC (not mixed) that follows the end of the marking phase, the eden is reduced to a very small size, for example: 2014-12-05T04:47:39.528-0800: 5054.135: [GC concurrent-cleanup-end, 0.0009276 secs] 2014-12-05T04:47:53.949-0800: 5068.556: [GC pause (G1 Evacuation Pause) (young) [Eden: 12.6G(12.6G)->0.0B(880.0M) Survivors: 768.0M->752.0M Heap: 26.8G(32.0G)->14.2G(32.0G)] [Times: user=3.79 sys=0.05, real=0.22 secs] In the example above the Eden is shrunk from 12.6G to 880M. G1 then keeps the eden small for the mixed GCs that follow. After the mixed GCs have finished, young GCs are performed again, which eventually re-grow the eden to a size similar to what was before being shrunk. In my case, in normal young GC regime, the young GCs happens more or less every 15-20s, while during the mixed GC and for few young GCs that follow they happen every about 2s. Now, this behaviour can be explained by the fact that, in order to make room for the old regions to be evacuated, and yet still stay within the pause goal, less eden regions need to be taken into account for the evacuation. Since the allocation rate does not change, less eden regions cause more frequent GCs. I am wondering whether this behaviour is right/expected/known, and what effects it has on the pause time prediction logic (e.g. mixed GCs times are not taken into account) as well as on the early promotion of objects from eden. I can provide GC logs and command line options if required. Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From thomas.schatzl at oracle.com Tue Dec 9 13:43:43 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 09 Dec 2014 14:43:43 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: References: Message-ID: <1418132623.3361.23.camel@oracle.com> Hi, On Tue, 2014-12-09 at 14:40 +0100, Simone Bordet wrote: > Hi, > > I observed the following behaviour in G1 and I would like some > feedback with respect to whether it is right/expected/known or not. > > I have an application that allocates about 500-700 MiB/s, with a > promotion rate of about 5-7 MiB/s on a 32 GiB heap. > From time to time, G1 performs a marking and then mixed GCs. > > During the young GC (not mixed) that follows the end of the marking > phase, the eden is reduced to a very small size, for example: sounds like JDK-8035557. Thanks, Thomas From simone.bordet at gmail.com Tue Dec 9 13:58:19 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Tue, 9 Dec 2014 14:58:19 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: <1418132623.3361.23.camel@oracle.com> References: <1418132623.3361.23.camel@oracle.com> Message-ID: Hi, On Tue, Dec 9, 2014 at 2:43 PM, Thomas Schatzl wrote: > sounds like JDK-8035557. Thanks for the fast response ! With respect to https://bugs.openjdk.java.net/browse/JDK-8035557 I observe that the Eden shrinks in the young GC *before* the mixed GCs. Is that just an alternate behaviour of the same bug ? I remember a while back G1 performing mixed GC after the end of marking. More recently, it always perform a young GC after the end of marking, before starting the mixed GCs. Was this young GC added exactly to cope with mispredictions due to evacuations of old regions ? Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From thomas.schatzl at oracle.com Tue Dec 9 14:06:50 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 09 Dec 2014 15:06:50 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: References: <1418132623.3361.23.camel@oracle.com> Message-ID: <1418134010.3361.26.camel@oracle.com> Hi, On Tue, 2014-12-09 at 14:58 +0100, Simone Bordet wrote: > Hi, > > On Tue, Dec 9, 2014 at 2:43 PM, Thomas Schatzl > wrote: > > sounds like JDK-8035557. > > Thanks for the fast response ! > > With respect to https://bugs.openjdk.java.net/browse/JDK-8035557 I > observe that the Eden shrinks in the young GC *before* the mixed GCs. > Is that just an alternate behaviour of the same bug ? This may be because the heap is already very full (around 100-G1ReservePercent) so it cuts down on the young gen size. G1 tries to keep G1ReservePercent of heap empty to avoid evacuation failure. Another reason could be that G1 thinks that the pauses caused by marking cut too much into the available time budget (depending on your settings), so it decreases the heap. > I remember a while back G1 performing mixed GC after the end of marking. > More recently, it always perform a young GC after the end of marking, > before starting the mixed GCs. > Was this young GC added exactly to cope with mispredictions due to > evacuations of old regions ? that is JDK-8057781. I do not think it has ever been different though, but I may be wrong. Thomas From simone.bordet at gmail.com Tue Dec 9 14:26:03 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Tue, 9 Dec 2014 15:26:03 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: <1418134010.3361.26.camel@oracle.com> References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> Message-ID: Hi, On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl wrote: > This may be because the heap is already very full (around > 100-G1ReservePercent) so it cuts down on the young gen size. > > G1 tries to keep G1ReservePercent of heap empty to avoid evacuation > failure. It's not my case: I have a 32 GiB heap with a permanent live set of about 5-6 GiB, so I should have plenty of space even with a 15 GiB or so eden, which is what I typically see as max eden size. > Another reason could be that G1 thinks that the pauses caused by marking > cut too much into the available time budget (depending on your > settings), so it decreases the heap. You mean by the remark, which is STW ? Indeed I have very long remark pauses (up to 2.5s), apparently caused by weak reference processing. I need to investigate this issue, as the application itself does not use them (perhaps some library ?). There is a big difference between the references processed in the remark phase (3 millions) and those processed during a young GC (less than a thousand). Am I correct assuming that those processed during remark only belong to tenured ? Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From yu.zhang at oracle.com Wed Dec 10 06:18:56 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Tue, 09 Dec 2014 22:18:56 -0800 Subject: G1 eden resizing behaviour ? In-Reply-To: References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> Message-ID: <5487E5D0.9050207@oracle.com> Simone, Please see my comments in line. Thanks, Jenny On 12/9/2014 6:26 AM, Simone Bordet wrote: > Hi, > > On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl > wrote: >> This may be because the heap is already very full (around >> 100-G1ReservePercent) so it cuts down on the young gen size. >> >> G1 tries to keep G1ReservePercent of heap empty to avoid evacuation >> failure. > It's not my case: I have a 32 GiB heap with a permanent live set of > about 5-6 GiB, so I should have plenty of space even with a 15 GiB or > so eden, which is what I typically see as max eden size. Can you add -XX:+PrintAdaptiveSizePolicy and share your logs? I was guessing the same, that the heap is full. The mixed gc might not clean all the garbage. Even though the live data set is 5-6 GB, the heap can still be close to full. > >> Another reason could be that G1 thinks that the pauses caused by marking >> cut too much into the available time budget (depending on your >> settings), so it decreases the heap. > You mean by the remark, which is STW ? > Indeed I have very long remark pauses (up to 2.5s), apparently caused > by weak reference processing. > > I need to investigate this issue, as the application itself does not > use them (perhaps some library ?). > There is a big difference between the references processed in the > remark phase (3 millions) and those processed during a young GC (less > than a thousand). > > Am I correct assuming that those processed during remark only belong > to tenured ? I agree. Do you have -XX:+ParallelRefProcEnabled? This will help reducing the refproc time, but 3 millions is a lot. > > Thanks ! > From charlie.hunt at oracle.com Wed Dec 10 16:04:41 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Wed, 10 Dec 2014 08:04:41 -0800 Subject: G1 eden resizing behaviour ? In-Reply-To: <5487E5D0.9050207@oracle.com> References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> <5487E5D0.9050207@oracle.com> Message-ID: <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com> Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit. If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do. Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time. hths, charlie > On Dec 9, 2014, at 10:18 PM, Yu Zhang wrote: > > Simone, > > Please see my comments in line. > > Thanks, > Jenny > > On 12/9/2014 6:26 AM, Simone Bordet wrote: >> Hi, >> >> On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl >> wrote: >>> This may be because the heap is already very full (around >>> 100-G1ReservePercent) so it cuts down on the young gen size. >>> >>> G1 tries to keep G1ReservePercent of heap empty to avoid evacuation >>> failure. >> It's not my case: I have a 32 GiB heap with a permanent live set of >> about 5-6 GiB, so I should have plenty of space even with a 15 GiB or >> so eden, which is what I typically see as max eden size. > Can you add -XX:+PrintAdaptiveSizePolicy and share your logs? I was guessing the same, that the heap is full. The mixed gc might not clean all the garbage. Even though the live data set is 5-6 GB, the heap can still be close to full. >> >>> Another reason could be that G1 thinks that the pauses caused by marking >>> cut too much into the available time budget (depending on your >>> settings), so it decreases the heap. >> You mean by the remark, which is STW ? >> Indeed I have very long remark pauses (up to 2.5s), apparently caused >> by weak reference processing. >> >> I need to investigate this issue, as the application itself does not >> use them (perhaps some library ?). >> There is a big difference between the references processed in the >> remark phase (3 millions) and those processed during a young GC (less >> than a thousand). >> >> Am I correct assuming that those processed during remark only belong >> to tenured ? > I agree. Do you have -XX:+ParallelRefProcEnabled? This will help reducing the refproc time, but 3 millions is a lot. >> >> Thanks ! >> > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From simone.bordet at gmail.com Wed Dec 10 17:56:03 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Wed, 10 Dec 2014 18:56:03 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com> References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> <5487E5D0.9050207@oracle.com> <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com> Message-ID: Hi, On Wed, Dec 10, 2014 at 5:04 PM, charlie hunt wrote: > Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit. If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do. > > Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time. I have both enabled in the logs, that are unfortunately too big to be attached to a message of this mailing list. For those interested, I can send them privately. There are no SoftReferences, only a large number of WeakReferences. -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From yu.zhang at oracle.com Wed Dec 10 20:41:04 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Wed, 10 Dec 2014 12:41:04 -0800 Subject: G1 eden resizing behaviour ? In-Reply-To: References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> <5487E5D0.9050207@oracle.com> <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com> Message-ID: <5488AFE0.2060004@oracle.com> Simone, thanks for the log. checking the Eden size for young gc, I think the Eden size decrease is more related to mixed gc. The Eden size for several young gcs after mixed gc are small. This is a known issue. Thanks, Jenny On 12/10/2014 9:56 AM, Simone Bordet wrote: > Hi, > > On Wed, Dec 10, 2014 at 5:04 PM, charlie hunt wrote: >> Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit. If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do. >> >> Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time. > I have both enabled in the logs, that are unfortunately too big to be > attached to a message of this mailing list. > For those interested, I can send them privately. > > There are no SoftReferences, only a large number of WeakReferences. > From simone.bordet at gmail.com Wed Dec 10 21:14:16 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Wed, 10 Dec 2014 22:14:16 +0100 Subject: G1 eden resizing behaviour ? In-Reply-To: <5488AFE0.2060004@oracle.com> References: <1418132623.3361.23.camel@oracle.com> <1418134010.3361.26.camel@oracle.com> <5487E5D0.9050207@oracle.com> <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com> <5488AFE0.2060004@oracle.com> Message-ID: Hi, On Wed, Dec 10, 2014 at 9:41 PM, Yu Zhang wrote: > Simone, > > thanks for the log. checking the Eden size for young gc, I think the Eden > size decrease is more related to mixed gc. The Eden size for several young > gcs after mixed gc are small. This is a known issue. Thanks for confirming this. -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From simone.bordet at gmail.com Thu Dec 11 15:38:46 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Thu, 11 Dec 2014 16:38:46 +0100 Subject: G1: "Other" time too long ? Message-ID: Hi, G1 with a 32 GiB heap (16 MiB region size), I was seeing high "Update RS" and "Scan RS" times during mixed GCs. I am aware of -XX:G1RSetRegionEntries, but I wanted to try another path: whether reducing manually the region size caused less inter region references and therefore reduced the probable coarsening that was the cause of the long RS times. So I set the region size to 2 MiB and re-run. Now I get very high "Other" times, for example: [Other: 464.1 ms] [Choose CSet: 0.1 ms] [Ref Proc: 52.4 ms] [Ref Enq: 1.8 ms] [Redirty Cards: 19.4 ms] [Free CSet: 22.7 ms] The sum of the subtask times is not close to the "Other" time so I was wondering what else it's done in the "Other" processing, or whether perhaps it is not reporting what I think (e.g. a sequential time vs a parallel time). I'd probably revert to a 16 MiB region size and setting G1RSetRegionEntries, but I was wondering if someone can shed some light on this. Logs are too big for this mailing list, but I can provide them to interested people. Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From yu.zhang at oracle.com Thu Dec 11 16:58:09 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Thu, 11 Dec 2014 08:58:09 -0800 Subject: G1: "Other" time too long ? In-Reply-To: References: Message-ID: <5489CD21.1050207@oracle.com> Simone, Do you have -XX:+G1SummarizeRSetStats? I had seen Other does not add up with this statistics, especially when RSet is big. Currently the default G1RSetRegionEntries is set according to region size. So with bigger region size, you have more G1RSetRegionEntries. You might get more references between regions with smaller region size. You might already tried this, you can use -XX:+G1SummarizeRSetStats -XX:G1SummarizeRSetStatsPeriod= to see if you have coarsening. If coarsening is high, the RS operations are more expensive. Thanks, Jenny On 12/11/2014 7:38 AM, Simone Bordet wrote: > Hi, > > G1 with a 32 GiB heap (16 MiB region size), I was seeing high "Update > RS" and "Scan RS" times during mixed GCs. > > I am aware of -XX:G1RSetRegionEntries, but I wanted to try another > path: whether reducing manually the region size caused less inter > region references and therefore reduced the probable coarsening that > was the cause of the long RS times. > > So I set the region size to 2 MiB and re-run. > > Now I get very high "Other" times, for example: > > [Other: 464.1 ms] > [Choose CSet: 0.1 ms] > [Ref Proc: 52.4 ms] > [Ref Enq: 1.8 ms] > [Redirty Cards: 19.4 ms] > [Free CSet: 22.7 ms] > > The sum of the subtask times is not close to the "Other" time so I was > wondering what else it's done in the "Other" processing, or whether > perhaps it is not reporting what I think (e.g. a sequential time vs a > parallel time). > > I'd probably revert to a 16 MiB region size and setting > G1RSetRegionEntries, but I was wondering if someone can shed some > light on this. > > Logs are too big for this mailing list, but I can provide them to > interested people. > > Thanks ! > From simone.bordet at gmail.com Thu Dec 11 18:16:39 2014 From: simone.bordet at gmail.com (Simone Bordet) Date: Thu, 11 Dec 2014 19:16:39 +0100 Subject: G1: "Other" time too long ? In-Reply-To: <5489CD21.1050207@oracle.com> References: <5489CD21.1050207@oracle.com> Message-ID: Hi, thanks for the quick reply. On Thu, Dec 11, 2014 at 5:58 PM, Yu Zhang wrote: > Simone, > > Do you have -XX:+G1SummarizeRSetStats? Yes. > I had seen Other does not add up > with this statistics, especially when RSet is big. Ok, thanks for confirming this. Have you seen differences as big as mine ? > Currently the default G1RSetRegionEntries is set according to region size. > So with bigger region size, you have more G1RSetRegionEntries. You might > get more references between regions with smaller region size. You might > already tried this, you can use -XX:+G1SummarizeRSetStats > -XX:G1SummarizeRSetStatsPeriod= to see if you have coarsening. If > coarsening is high, the RS operations are more expensive. I did not have G1SummarizeRSetStats for the case with region_size = 16 MiB, but I do have it for the case region_size = 2 MiB. I can see coarsenings in the latter case, so I guess it will be more so for the former case, but I'll verify it. The coarsenings I see are in the order 400-500 with peaks in the thousands and one big at 289509. I see that the formula to calculate the region entries is: table_size = base * (log(region_size / 1M) + 1) so for a 16 MiB region size the region entries are 564. I'll retry with a higher value to see if I get any benefit. Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From thomas.schatzl at oracle.com Thu Dec 11 18:22:17 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 11 Dec 2014 19:22:17 +0100 Subject: G1: "Other" time too long ? In-Reply-To: References: <5489CD21.1050207@oracle.com> Message-ID: <1418322137.3214.3.camel@oracle.com> Hi, On Thu, 2014-12-11 at 19:16 +0100, Simone Bordet wrote: > Hi, > > thanks for the quick reply. > > On Thu, Dec 11, 2014 at 5:58 PM, Yu Zhang wrote: > > Simone, > > > > Do you have -XX:+G1SummarizeRSetStats? > > Yes. > > > I had seen Other does not add up > > with this statistics, especially when RSet is big. > > Ok, thanks for confirming this. > Have you seen differences as big as mine ? Yes, we can also see pauses caused by G1SummarizeRSetStats in that range. G1SummarizeRSetStats is not meant to be always on, just to diagnose potential remembered set problems. Thanks, Thomas From yu.zhang at oracle.com Thu Dec 11 23:14:36 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Thu, 11 Dec 2014 15:14:36 -0800 Subject: G1: "Other" time too long ? In-Reply-To: References: <5489CD21.1050207@oracle.com> Message-ID: <548A255C.30704@oracle.com> Simone, The formula is correct. But with 16M region size, you get 256(base)*5 entries. Also with bigger region size, you might get less Remember set references. So bigger chance not seeing coarsening. Thanks, Jenny On 12/11/2014 10:16 AM, Simone Bordet wrote: > I did not have G1SummarizeRSetStats for the case with region_size = 16 > MiB, but I do have it for the case region_size = 2 MiB. > I can see coarsenings in the latter case, so I guess it will be more > so for the former case, but I'll verify it. > The coarsenings I see are in the order 400-500 with peaks in the > thousands and one big at 289509. > > I see that the formula to calculate the region entries is: > > table_size = base * (log(region_size / 1M) + 1) > > so for a 16 MiB region size the region entries are 564. > > I'll retry with a higher value to see if I get any benefit. From java at elyograg.org Tue Dec 16 17:30:28 2014 From: java at elyograg.org (Shawn Heisey) Date: Tue, 16 Dec 2014 10:30:28 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org Message-ID: <54906C34.8080408@elyograg.org> Here's a message I sent to dev at lucene.apache.org, with enough quoted history to know what's happening. My testing is with Solr 4.7.2. http://lucene.apache.org/solr/ Rory O'Donnell at Oracle suggested that I start a thread here. Thanks, Shawn On 12/6/2014 3:00 PM, Shawn Heisey wrote: > On 12/5/2014 2:42 PM, Erick Erickson wrote: >> Saw this on the Cloudera website: >> >> http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-hbase/ >> >> Original post here: >> https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase >> >> Although it's for hbase, I thought the presentation went into enough >> detail about what improvements they'd seen that I can see it being >> useful for Solr folks. And we have some people on this list who are >> interested in this sort of thing.... > Very interesting. My own experiences with G1 and Solr (which I haven't > repeated since early Java 7 releases, something like 7u10 or 7u13) would > show even worse spikes compared to the blue lines on those graphs ... > and my heap isn't anywhere even CLOSE to 100GB. Solr probably has > different garbage creation characteristics than hbase. Followup with graphs. I've cc'd Rory at Oracle too, with hopes that this info will ultimately reach those who work on G1. I can provide the actual GC logs as well. Here's a graph of a GC log lasting over two weeks with a tuned CMS collector and Oracle Java 7u25 and a 6GB heap. https://www.dropbox.com/s/mygjeviyybqqnqd/cms-7u25.png?dl=0 CMS was tuned using these settings: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning This graph shows that virtually all collection pauses were a little under half a second. There were exactly three full garbage collections, and each one took around six seconds. While that is a significant pause, having only three such collections over a period of 16 days sounds pretty good to me. Here's about half as much runtime (8 days) on the same server running G1 with Oracle 7u72 and the same 6GB heap. G1 is untuned, because I do not know how: https://www.dropbox.com/s/2kgx60gj988rflj/g1-7u72.png?dl=0 Most of these collections were around a tenth of a second ... which is certainly better than nearly half a second ... but there are a LOT of collections that take longer than a second, and a fair number of them that took between 3 and 5 seconds. It's difficult to say which of these graphs is actually better. The CMS graph is certainly more consistent, and does a LOT fewer full GCs ... but is the 4 to 1 improvement in a typical GC enough to reveal significantly better performance? My instinct says that it would NOT be enough for that, especially with so many collections taking 1-3 seconds. If the server was really busy (mine isn't), I wonder whether the GC graph would look similar, or whether it would be really different. A busy server would need to collect a lot more garbage, so I fear that the yellow and black parts of the G1 graph would dominate more than they do in my graph, which would be overall a bad thing. Only real testing on busy servers can tell us that. I can tell you for sure that the G1 graph looks a lot better than it did in early Java 7 releases, but additional work by Oracle (and perhaps some G1 tuning options) might significantly improve it. Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashwin.jayaprakash at gmail.com Wed Dec 17 04:47:36 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Tue, 16 Dec 2014 20:47:36 -0800 Subject: Multi-second ParNew collections but stable CMS Message-ID: Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload) These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references to this old ParNew+CMS bug which I thought would've been addressed in the JRE version we are using. Cassandra recommends a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends a small NewSize. Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses? Thanks in advance. Here are some details. *Heap settings:* java -Xmx31000m -Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails *Last few lines of "kill -3 pid" output:* Heap par new generation total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000) eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000) from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000) to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000) concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) *Sample gc log:* 2014-12-11T23:32:16.121+0000: 710.618: [ParNew Desired survivor size 56688640 bytes, new threshold 6 (max 6) - age 1: 2956312 bytes, 2956312 total - age 2: 591800 bytes, 3548112 total - age 3: 66216 bytes, 3614328 total - age 4: 270752 bytes, 3885080 total - age 5: 615472 bytes, 4500552 total - age 6: 358440 bytes, 4858992 total : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: -433641480 Max Chunk Size: -433641480 Number of Blocks: 1 Av. Block Size: -433641480 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1227 Max Chunk Size: 631 Number of Blocks: 3 Av. Block Size: 409 Tree Height: 3 , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] Ashwin Jayaprakash. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Wed Dec 17 15:50:56 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 17 Dec 2014 16:50:56 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <54906C34.8080408@elyograg.org> References: <54906C34.8080408@elyograg.org> Message-ID: <1418831456.3255.22.camel@oracle.com> Hi Shawn, > On 12/6/2014 3:00 PM, Shawn Heisey wrote: > > On 12/5/2014 2:42 PM, Erick Erickson wrote: > > > Saw this on the Cloudera website: > > > > > > http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-hbase/ > > > > > > O[...] > Here's a graph of a GC log lasting over two weeks with a tuned CMS > collector and Oracle Java 7u25 and a 6GB heap. > > https://www.dropbox.com/s/mygjeviyybqqnqd/cms-7u25.png?dl=0 > > CMS was tuned using these settings: > > http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning > > This graph shows that virtually all collection pauses were a little > under half a second. There were exactly three full garbage collections, > and each one took around six seconds. While that is a significant > pause, having only three such collections over a period of 16 days > sounds pretty good to me. > > Here's about half as much runtime (8 days) on the same server running G1 > with Oracle 7u72 and the same 6GB heap. G1 is untuned, because I do not > know how: > > https://www.dropbox.com/s/2kgx60gj988rflj/g1-7u72.png?dl=0 > > Most of these collections were around a tenth of a second ... which is > certainly better than nearly half a second ... but there are a LOT of > collections that take longer than a second, and a fair number of them > that took between 3 and 5 seconds. > > It's difficult to say which of these graphs is actually better. The CMS > graph is certainly more consistent, and does a LOT fewer full GCs ... > but is the 4 to 1 improvement in a typical GC enough to reveal > significantly better performance? My instinct says that it would NOT be > enough for that, especially with so many collections taking 1-3 seconds. > > If the server was really busy (mine isn't), I wonder whether the GC > graph would look similar, or whether it would be really different. A > busy server would need to collect a lot more garbage, so I fear that the > yellow and black parts of the G1 graph would dominate more than they do > in my graph, which would be overall a bad thing. Only real testing on > busy servers can tell us that. > > I can tell you for sure that the G1 graph looks a lot better than it did > in early Java 7 releases, but additional work by Oracle (and perhaps > some G1 tuning options) might significantly improve it. could you provide some logs to look at? It is impossible to give good recommendations without having at least some more detail about what's going on. Preferably logs with at least the mentioned options they used to tune the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX: +PrintAdaptiveSizePolicy It might also be a good idea to start with the options given in the cloudera blog entry: -XX:MaxGCPauseMillis=100 // the max pause time you want -XX:+ParallelRefProcEnabled // not sure, only if Solr uses lots of soft or weak references. -XX:-ResizePLAB // that's minor -XX:G1NewSizePercent=1 // that may help in achieving the pause time goal -XmsM -XmxM I do not think there is need to set the ParallelGCThreads according to that formula. This has been the default formula for calculating the number of threads for all collectors for a long time (but then again it might have changed sometime in jdk7). You may also want to use a JDK 8 build, preferably (for me :) some 8u40 EA build (e.g. from https://jdk8.java.net/download.html); there have been a lot of improvements to G1 in JDK8, and in particular 8u40. Thanks, Thomas From jon.masamitsu at oracle.com Wed Dec 17 16:31:57 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 17 Dec 2014 08:31:57 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: Message-ID: <5491AFFD.2080907@oracle.com> Ashwin, You sent a sample GC log with a fast ParNew (about 31ms) right? Can you send examples of the slow ParNew's? I'd like to see what I can see in the logs that is changing from the fast GC's to the slower GC's. If I can download a complete log, that would be useful (there is a size limit on what you can mail to the list so mailing might not work). Jon On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: > Hi, we have a cluster of ElasticSearch servers running with 31G heap > and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). > > While our old gen seems to be very stable with about 40% usage and no > Full GCs so far, our young gen keeps growing from ~50MB to 850MB every > few seconds. These ParNew collections are taking anywhere between 1-7 > seconds and is causing some of our requests to time out. The eden > space keeps filling up and then cleared every 30-60 seconds. There is > definitely work being done by our JVM in terms of caching/buffering > objects for a few seconds, writing to disk and then clearing the > objects (typical Lucene/ElasticSearch indexing and querying workload) > > These long pauses are not good for our server throughput and I was > doing some reading. I got some conflicting reports on how Cassandra is > configured compared to Hadoop. There are also references > > to this old ParNew+CMS bug > which I thought > would've been addressed in the JRE version we are using. Cassandra > recommends > a larger > NewSize with just 1 for max tenuring, whereas Hadoop recommends > a small NewSize. > > Since most of our allocations seem to be quite short lived, is there a > way to avoid these "long" young gen pauses? > > Thanks in advance. Here are some details. > * > Heap settings:* > java -Xmx31000m -Xms31000m > -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m > -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure > -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution > -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps > -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC > -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails > > * > Last few lines of "kill -3 pid" output:* > Heap > par new generation total 996800K, used 865818K [0x00007fa18e800000, > 0x00007fa1d2190000, 0x00007fa1d2190000) > eden space 886080K, 94% used [0x00007fa18e800000, > 0x00007fa1c1a659e0, 0x00007fa1c4950000) > from space 110720K, 25% used [0x00007fa1cb570000, > 0x00007fa1cd091078, 0x00007fa1d2190000) > to space 110720K, 0% used [0x00007fa1c4950000, > 0x00007fa1c4950000, 0x00007fa1cb570000) > concurrent mark-sweep generation total 30636480K, used 12036523K > [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) > concurrent-mark-sweep perm gen total 128856K, used 77779K > [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) > * > * > *Sample gc log:* > 2014-12-11T23:32:16.121+0000: 710.618: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 2956312 bytes, 2956312 total > - age 2: 591800 bytes, 3548112 total > - age 3: 66216 bytes, 3614328 total > - age 4: 270752 bytes, 3885080 total > - age 5: 615472 bytes, 4500552 total > - age 6: 358440 bytes, 4858992 total > : 900635K->8173K(996800K), 0.0317340 secs] > 1352217K->463460K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -433641480 > Max Chunk Size: -433641480 > Number of Blocks: 1 > Av. Block Size: -433641480 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1227 > Max Chunk Size: 631 > Number of Blocks: 3 > Av. Block Size: 409 > Tree Height: 3 > , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] > > > Ashwin Jayaprakash. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From java at elyograg.org Wed Dec 17 19:15:37 2014 From: java at elyograg.org (Shawn Heisey) Date: Wed, 17 Dec 2014 12:15:37 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <1418831456.3255.22.camel@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> Message-ID: <5491D659.1090703@elyograg.org> On 12/17/2014 8:50 AM, Thomas Schatzl wrote: > could you provide some logs to look at? It is impossible to give good > recommendations without having at least some more detail about what's > going on. > > Preferably logs with at least the mentioned options they used to tune > the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX: > +PrintAdaptiveSizePolicy > > It might also be a good idea to start with the options given in the > cloudera blog entry: > > -XX:MaxGCPauseMillis=100 // the max pause time you want > -XX:+ParallelRefProcEnabled // not sure, only if Solr uses lots of > soft or weak references. > -XX:-ResizePLAB // that's minor > -XX:G1NewSizePercent=1 // that may help in achieving the > pause time goal > -XmsM > -XmxM > > I do not think there is need to set the ParallelGCThreads according to > that formula. This has been the default formula for calculating the > number of threads for all collectors for a long time (but then again it > might have changed sometime in jdk7). > > You may also want to use a JDK 8 build, preferably (for me :) some 8u40 > EA build (e.g. from https://jdk8.java.net/download.html); there have > been a lot of improvements to G1 in JDK8, and in particular 8u40. Strange, I seem to have only received the copy of this message sent directly to me, I never got the list copy. Here's the options I'm using for G1 on 7u72: JVM_OPTS=" \ -XX:+UseG1GC \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " Here's the options I used for CMS on 7u25: JVM_OPTS=" \ -XX:NewRatio=3 \ -XX:SurvivorRatio=4 \ -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:+UseConcMarkSweepGC \ -XX:+CMSScavengeBeforeRemark \ -XX:PretenureSizeThreshold=64m \ -XX:CMSFullGCsBeforeCompaction=1 \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=70 \ -XX:CMSTriggerPermRatio=80 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages \ -XX:+AggressiveOpts \ " In both cases, I used -Xms4096M and -Xmx6144M. These are the GC logging options: GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails" Here's the GC logs that I already have: https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0 https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0 I believe that Lucene does use a lot of references. I am more familiar with Solr code than Lucene, but even on Solr, I am not well-versed in the lower-level details. I will get PrintAdaptiveSizePolicy added to my GC logging options. Unless the performance improvement in Java 8 is significant, I don't think I can make a compelling case to switch from Java 7 yet. Although I have UseLargePages, I do not have any huge pages allocated in the CentOS 6 operating system, so this is not actually doing anything. Thanks, Shawn From thomas.schatzl at oracle.com Wed Dec 17 20:51:53 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 17 Dec 2014 21:51:53 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <5491D659.1090703@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> Message-ID: <1418849513.3293.3.camel@oracle.com> Hi Shawn, Shawn Heisey wrote: > On 12/17/2014 8:50 AM, Thomas Schatzl wrote: > > could you provide some logs to look at? It is impossible to give good > > recommendations without having at least some more detail about what's > > going on. > > > > Preferably logs with at least the mentioned options they used to tune > > the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX: > > +PrintAdaptiveSizePolicy > > > > It might also be a good idea to start with the options given in the > > cloudera blog entry: > > > > -XX:MaxGCPauseMillis=100 // the max pause time you want > > -XX:+ParallelRefProcEnabled // not sure, only if Solr uses lots of > > soft or weak references. > > -XX:-ResizePLAB // that's minor > > -XX:G1NewSizePercent=1 // that may help in achieving the > > pause time goal > > -XmsM > > -XmxM > > > > I do not think there is need to set the ParallelGCThreads according to > > that formula. This has been the default formula for calculating the > > number of threads for all collectors for a long time (but then again it > > might have changed sometime in jdk7). > > > > You may also want to use a JDK 8 build, preferably (for me :) some 8u40 > > EA build (e.g. from https://jdk8.java.net/download.html); there have > > been a lot of improvements to G1 in JDK8, and in particular 8u40. > > Strange, I seem to have only received the copy of this message sent > directly to me, I never got the list copy. Not sure why. One copy has been archived in the mailing list archives though... > Here's the options I'm using for G1 on 7u72: > > JVM_OPTS=" \ > -XX:+UseG1GC \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " > > Here's the options I used for CMS on 7u25: > > JVM_OPTS=" \ > -XX:NewRatio=3 \ > -XX:SurvivorRatio=4 \ > -XX:TargetSurvivorRatio=90 \ > -XX:MaxTenuringThreshold=8 \ > -XX:+UseConcMarkSweepGC \ > -XX:+CMSScavengeBeforeRemark \ > -XX:PretenureSizeThreshold=64m \ > -XX:CMSFullGCsBeforeCompaction=1 \ > -XX:+UseCMSInitiatingOccupancyOnly \ > -XX:CMSInitiatingOccupancyFraction=70 \ > -XX:CMSTriggerPermRatio=80 \ > -XX:CMSMaxAbortablePrecleanTime=6000 \ > -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " > > In both cases, I used -Xms4096M and -Xmx6144M. These are the GC logging > options: > > GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails" > > Here's the GC logs that I already have: > > https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0 > https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0 > please also add -XX:+PrintReferenceGC, and definitely use -XX: +ParallelRefProcEnabled. GC is spending a significant amount of the time in soft/weak reference processing. -XX:+ParallelRefProcEnabled will help, but there will be spikes still. I saw that GC sometimes spends 1000ms just processing those references; using 8 threads this should get better. That alone will likely make it hard reaching a 100ms pause time goal (1000ms/8 = 125ms...). CMS has the same problems, and while on average it has ~215ms pauses, there seem to be a lot that are a lot longer too. Reference processing also takes very long, even with -XX:+ParallelRefProcEnabled. I am not sure about the cause for the full gc's: either the pause time prediction in G1 in that version is too bad and it tries to use a way too large young gen, or there are a few very large objects around. Depending on the log output and the impact of the other options we might want to cap the maximum young gen size. > I believe that Lucene does use a lot of references. I saw that. Must be millions. -XX:+PrintReferenceGC should show that (also in CMS). > I am more familiar > with Solr code than Lucene, but even on Solr, I am not well-versed in > the lower-level details. > > I will get PrintAdaptiveSizePolicy added to my GC logging options. > > Unless the performance improvement in Java 8 is significant, I don't > think I can make a compelling case to switch from Java 7 yet. >From the top of my head: - logging is better - parallelized a few more GC phases - class unloading after concurrent mark (not only during full gc) - but that does not seem to be a problem - prediction fixes - much improved handling of large objects - does not seem to be a problem here - slew of bugfixes I am mostly missing the improved logging for analysis, and the improvements in pause times. > Although I have UseLargePages, I do not have any huge pages allocated in > the CentOS 6 operating system, so this is not actually doing anything. Thanks, Thomas From ashwin.jayaprakash at gmail.com Wed Dec 17 23:12:14 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Wed, 17 Dec 2014 15:12:14 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: Message-ID: I've uploaded our latest GC log files to - https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing I've also summarized the top pause times by running "grep -oE "Times: user=.*" gc.log.0 | sort -nr | head -25" Times: user=7.89 sys=0.55, real=0.65 secs] Times: user=7.71 sys=4.59, real=1.10 secs] Times: user=7.46 sys=0.32, real=0.67 secs] Times: user=6.55 sys=0.96, real=0.68 secs] Times: user=6.40 sys=0.27, real=0.57 secs] Times: user=6.27 sys=0.65, real=0.55 secs] Times: user=6.24 sys=0.29, real=0.52 secs] Times: user=5.25 sys=0.26, real=0.45 secs] Times: user=4.95 sys=0.49, real=0.53 secs] Times: user=4.90 sys=0.54, real=0.45 secs] Times: user=4.55 sys=1.46, real=0.61 secs] Times: user=4.39 sys=0.26, real=0.40 secs] Times: user=3.61 sys=0.39, real=0.50 secs] Times: user=3.59 sys=0.18, real=0.35 secs] Times: user=3.16 sys=0.00, real=3.17 secs] Times: user=3.06 sys=0.14, real=0.25 secs] Times: user=3.05 sys=0.24, real=0.33 secs] Times: user=3.03 sys=0.14, real=0.25 secs] Times: user=2.97 sys=0.38, real=0.33 secs] Times: user=2.77 sys=0.14, real=0.25 secs] Times: user=2.51 sys=0.08, real=0.22 secs] Times: user=2.49 sys=0.13, real=0.21 secs] Times: user=2.25 sys=0.32, real=0.26 secs] Times: user=2.06 sys=0.12, real=0.19 secs] Times: user=2.06 sys=0.11, real=0.17 secs] I wonder if we should enable "UseCondCardMark"? Thanks. On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash < ashwin.jayaprakash at gmail.com> wrote: > > Hi, we have a cluster of ElasticSearch servers running with 31G heap and > OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). > > While our old gen seems to be very stable with about 40% usage and no Full > GCs so far, our young gen keeps growing from ~50MB to 850MB every few > seconds. These ParNew collections are taking anywhere between 1-7 seconds > and is causing some of our requests to time out. The eden space keeps > filling up and then cleared every 30-60 seconds. There is definitely work > being done by our JVM in terms of caching/buffering objects for a few > seconds, writing to disk and then clearing the objects (typical > Lucene/ElasticSearch indexing and querying workload) > > These long pauses are not good for our server throughput and I was doing > some reading. I got some conflicting reports on how Cassandra is configured > compared to Hadoop. There are also references > > to this old ParNew+CMS bug > which I thought > would've been addressed in the JRE version we are using. Cassandra > recommends a > larger NewSize with just 1 for max tenuring, whereas Hadoop recommends > a small NewSize. > > Since most of our allocations seem to be quite short lived, is there a way > to avoid these "long" young gen pauses? > > Thanks in advance. Here are some details. > > *Heap settings:* > java -Xmx31000m -Xms31000m > -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m > -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure > -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution > -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps > -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 > -XX:+PrintGCDetails > > > *Last few lines of "kill -3 pid" output:* > Heap > par new generation total 996800K, used 865818K [0x00007fa18e800000, > 0x00007fa1d2190000, 0x00007fa1d2190000) > eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, > 0x00007fa1c4950000) > from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, > 0x00007fa1d2190000) > to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, > 0x00007fa1cb570000) > concurrent mark-sweep generation total 30636480K, used 12036523K > [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) > concurrent-mark-sweep perm gen total 128856K, used 77779K > [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) > > *Sample gc log:* > 2014-12-11T23:32:16.121+0000: 710.618: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 2956312 bytes, 2956312 total > - age 2: 591800 bytes, 3548112 total > - age 3: 66216 bytes, 3614328 total > - age 4: 270752 bytes, 3885080 total > - age 5: 615472 bytes, 4500552 total > - age 6: 358440 bytes, 4858992 total > : 900635K->8173K(996800K), 0.0317340 secs] > 1352217K->463460K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -433641480 > Max Chunk Size: -433641480 > Number of Blocks: 1 > Av. Block Size: -433641480 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1227 > Max Chunk Size: 631 > Number of Blocks: 3 > Av. Block Size: 409 > Tree Height: 3 > , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] > > > Ashwin Jayaprakash. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustav.r.akesson at gmail.com Thu Dec 18 08:05:38 2014 From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=) Date: Thu, 18 Dec 2014 09:05:38 +0100 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: Message-ID: Hi, I see a significant increase in systime which (to my experience) usually is because of either page swaps (some parts of the heap has to be paged in in STW phase), or long latency for GC writing the logs to disc (which is a synchronous operation as part of GC cycle). When these multi-second YGCs occur, have you noticed an increase of page swaps? What is the resident size of this particular Java process? Could you try to write the GC logs to a RAM disc and see if the problem goes away? Best Regards, Gustav ?kesson On Thu, Dec 18, 2014 at 12:12 AM, Ashwin Jayaprakash < ashwin.jayaprakash at gmail.com> wrote: > > I've uploaded our latest GC log files to - > https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing > > I've also summarized the top pause times by running "grep -oE "Times: > user=.*" gc.log.0 | sort -nr | head -25" > > Times: user=7.89 sys=0.55, real=0.65 secs] > Times: user=7.71 sys=4.59, real=1.10 secs] > Times: user=7.46 sys=0.32, real=0.67 secs] > Times: user=6.55 sys=0.96, real=0.68 secs] > Times: user=6.40 sys=0.27, real=0.57 secs] > Times: user=6.27 sys=0.65, real=0.55 secs] > Times: user=6.24 sys=0.29, real=0.52 secs] > Times: user=5.25 sys=0.26, real=0.45 secs] > Times: user=4.95 sys=0.49, real=0.53 secs] > Times: user=4.90 sys=0.54, real=0.45 secs] > Times: user=4.55 sys=1.46, real=0.61 secs] > Times: user=4.39 sys=0.26, real=0.40 secs] > Times: user=3.61 sys=0.39, real=0.50 secs] > Times: user=3.59 sys=0.18, real=0.35 secs] > Times: user=3.16 sys=0.00, real=3.17 secs] > Times: user=3.06 sys=0.14, real=0.25 secs] > Times: user=3.05 sys=0.24, real=0.33 secs] > Times: user=3.03 sys=0.14, real=0.25 secs] > Times: user=2.97 sys=0.38, real=0.33 secs] > Times: user=2.77 sys=0.14, real=0.25 secs] > Times: user=2.51 sys=0.08, real=0.22 secs] > Times: user=2.49 sys=0.13, real=0.21 secs] > Times: user=2.25 sys=0.32, real=0.26 secs] > Times: user=2.06 sys=0.12, real=0.19 secs] > Times: user=2.06 sys=0.11, real=0.17 secs] > > I wonder if we should enable "UseCondCardMark"? > > Thanks. > > > > > > > > On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash < > ashwin.jayaprakash at gmail.com> wrote: >> >> Hi, we have a cluster of ElasticSearch servers running with 31G heap and >> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >> >> While our old gen seems to be very stable with about 40% usage and no >> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few >> seconds. These ParNew collections are taking anywhere between 1-7 seconds >> and is causing some of our requests to time out. The eden space keeps >> filling up and then cleared every 30-60 seconds. There is definitely work >> being done by our JVM in terms of caching/buffering objects for a few >> seconds, writing to disk and then clearing the objects (typical >> Lucene/ElasticSearch indexing and querying workload) >> >> These long pauses are not good for our server throughput and I was doing >> some reading. I got some conflicting reports on how Cassandra is configured >> compared to Hadoop. There are also references >> >> to this old ParNew+CMS bug >> which I thought >> would've been addressed in the JRE version we are using. Cassandra >> recommends >> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends >> a small NewSize. >> >> Since most of our allocations seem to be quite short lived, is there a >> way to avoid these "long" young gen pauses? >> >> Thanks in advance. Here are some details. >> >> *Heap settings:* >> java -Xmx31000m -Xms31000m >> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 >> -XX:+PrintGCDetails >> >> >> *Last few lines of "kill -3 pid" output:* >> Heap >> par new generation total 996800K, used 865818K [0x00007fa18e800000, >> 0x00007fa1d2190000, 0x00007fa1d2190000) >> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, >> 0x00007fa1c4950000) >> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, >> 0x00007fa1d2190000) >> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, >> 0x00007fa1cb570000) >> concurrent mark-sweep generation total 30636480K, used 12036523K >> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >> concurrent-mark-sweep perm gen total 128856K, used 77779K >> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >> >> *Sample gc log:* >> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 2956312 bytes, 2956312 total >> - age 2: 591800 bytes, 3548112 total >> - age 3: 66216 bytes, 3614328 total >> - age 4: 270752 bytes, 3885080 total >> - age 5: 615472 bytes, 4500552 total >> - age 6: 358440 bytes, 4858992 total >> : 900635K->8173K(996800K), 0.0317340 secs] >> 1352217K->463460K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -433641480 >> Max Chunk Size: -433641480 >> Number of Blocks: 1 >> Av. Block Size: -433641480 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1227 >> Max Chunk Size: 631 >> Number of Blocks: 3 >> Av. Block Size: 409 >> Tree Height: 3 >> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >> >> >> Ashwin Jayaprakash. >> > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Thu Dec 18 12:45:52 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Thu, 18 Dec 2014 06:45:52 -0600 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: Message-ID: <5492CC80.5010200@oracle.com> An HTML attachment was scrubbed... URL: From ashwin.jayaprakash at gmail.com Thu Dec 18 20:00:03 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Thu, 18 Dec 2014 12:00:03 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: <5492272A.2040304@oracle.com> References: <5492272A.2040304@oracle.com> Message-ID: *@Jon*, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time. *Jon's reply from an offline conversation:* > Are these the 7 second collections you refer to in the paragraph above? > If yes, the "user" time is the sum of the time spent by multiple GC > threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs ( http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 ). The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo ( https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 ). Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times. *@Gustav* We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM *"top" sample from a similar machine:* PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 2408:05 java *"free -g":* total used free shared buffers cached Mem: 120 119 0 0 0 95 -/+ buffers/cache: 23 96 Swap: 0 0 0 *@Charlie* Hugepages has already been disabled *sudo sysctl -a | grep hugepage* vm.nr_hugepages = 0 vm.nr_hugepages_mempolicy = 0 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 0 *cat /sys/kernel/mm/transparent_hugepage/enabled* [always] madvise never Thanks all! On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu wrote: > > Ashwin, > > On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: > > Hi, we have a cluster of ElasticSearch servers running with 31G heap and > OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). > > While our old gen seems to be very stable with about 40% usage and no Full > GCs so far, our young gen keeps growing from ~50MB to 850MB every few > seconds. These ParNew collections are taking anywhere between 1-7 seconds > and is causing some of our requests to time out. The eden space keeps > filling up and then cleared every 30-60 seconds. There is definitely work > being done by our JVM in terms of caching/buffering objects for a few > seconds, writing to disk and then clearing the objects (typical > Lucene/ElasticSearch indexing and querying workload) > > > From you recent mail > > Times: user=7.89 sys=0.55, real=0.65 secs] > Times: user=7.71 sys=4.59, real=1.10 secs] > Times: user=7.46 sys=0.32, real=0.67 secs] > > Are these the 7 second collections you refer to in the paragraph above? > If yes, the "user" time is the sum of the time spent by multiple GC > threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > > Comment below regarding "eden space keeps filling up". > > > These long pauses are not good for our server throughput and I was doing > some reading. I got some conflicting reports on how Cassandra is configured > compared to Hadoop. There are also references > > to this old ParNew+CMS bug > which I thought > would've been addressed in the JRE version we are using. Cassandra > recommends a > larger NewSize with just 1 for max tenuring, whereas Hadoop recommends > a small NewSize. > > Since most of our allocations seem to be quite short lived, is there a > way to avoid these "long" young gen pauses? > > Thanks in advance. Here are some details. > > * Heap settings:* > java -Xmx31000m -Xms31000m > -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m > -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure > -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution > -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps > -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 > -XX:+PrintGCDetails > > > * Last few lines of "kill -3 pid" output:* > Heap > par new generation total 996800K, used 865818K [0x00007fa18e800000, > 0x00007fa1d2190000, 0x00007fa1d2190000) > eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, > 0x00007fa1c4950000) > from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, > 0x00007fa1d2190000) > to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, > 0x00007fa1cb570000) > concurrent mark-sweep generation total 30636480K, used 12036523K > [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) > concurrent-mark-sweep perm gen total 128856K, used 77779K > [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) > > *Sample gc log:* > 2014-12-11T23:32:16.121+0000: 710.618: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 2956312 bytes, 2956312 total > - age 2: 591800 bytes, 3548112 total > - age 3: 66216 bytes, 3614328 total > - age 4: 270752 bytes, 3885080 total > - age 5: 615472 bytes, 4500552 total > - age 6: 358440 bytes, 4858992 total > : 900635K->8173K(996800K), 0.0317340 secs] > 1352217K->463460K(31633280K)After GC: > > > In this GC eden is at 900635k before the GC and is a 8173k after. That GC > fills up is > the expected behavior. Is that what you were asking about above? If not > can you > send me an example of the "fills up" behavior. > > Jon > > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -433641480 > Max Chunk Size: -433641480 > Number of Blocks: 1 > Av. Block Size: -433641480 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1227 > Max Chunk Size: 631 > Number of Blocks: 3 > Av. Block Size: 409 > Tree Height: 3 > , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] > > > Ashwin Jayaprakash. > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger.hoffstaette at googlemail.com Thu Dec 18 20:17:27 2014 From: holger.hoffstaette at googlemail.com (=?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?=) Date: Thu, 18 Dec 2014 21:17:27 +0100 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> Message-ID: <54933657.3010505@googlemail.com> On 12/18/14 21:00, Ashwin Jayaprakash wrote: > *cat /sys/kernel/mm/transparent_hugepage/enabled* > [always] madvise never That means THP aka khugepaged is still *enabled* and will still interfere. Whether it actually does can be seen in e.g. cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed -h From jon.masamitsu at oracle.com Thu Dec 18 20:23:56 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 18 Dec 2014 12:23:56 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> Message-ID: <549337DC.2070300@oracle.com> On 12/18/2014 12:00 PM, Ashwin Jayaprakash wrote: > *@Jon*, thanks for clearing that up. Yes, that was my source of > confusion. I was misinterpreting the user time with the real time. > > *Jon's reply from an offline conversation:* > > Are these the 7 second collections you refer to in the paragraph > above? > If yes, the "user" time is the sum of the time spent by multiple > GC threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > > > Something that added to my confusion was the tools we are using > in-house. In addition to the GC logs we have 1 tool that uses the > GarbageCollectorMXBean's getCollectionTime() method. This does not > seem to match the values I see in the GC logs > (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29). > > The other tool is the ElasticSearch JVM stats logger which uses > GarbageCollectorMXBean's LastGCInfo > (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 > and > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187). > > Do these methods expose the total time spent by all the parallel GC > threads for the ParNew pool or the "real" time? They do not seem to > match the GC log times. I haven't found the JVM code that provides information for getCollectionTime() but I would expect to match the pause times in the GC logs. From your earlier mail > *Sample gc log:* > 2014-12-11T23:32:16.121+0000: 710.618: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 2956312 bytes, 2956312 total > - age 2: 591800 bytes, 3548112 total > - age 3: 66216 bytes, 3614328 total > - age 4: 270752 bytes, 3885080 total > - age 5: 615472 bytes, 4500552 total > - age 6: 358440 bytes, 4858992 total > : 900635K->8173K(996800K), 0.0317340 secs] > 1352217K->463460K(31633280K)After GC: The time 0.0317340 is what the GC measures and is what I would expect to be available through the MXBeans. You're saying that does not match? Jon > *@Gustav* We do not have any swapping on the machines. It could be the > disk IO experienced by the GC log writer itself, as you've suspected. > The machine has 128G of RAM > > *"top" sample from a similar machine:* > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 > 2408:05 java > > *"free -g": > * > total used free shared buffers cached > Mem: 120 119 0 0 0 95 > -/+ buffers/cache: 23 96 > Swap: 0 0 0 > > *@Charlie* Hugepages has already been disabled > > *sudo sysctl -a | grep hugepage* > vm.nr_hugepages = 0 > vm.nr_hugepages_mempolicy = 0 > vm.hugepages_treat_as_movable = 0 > vm.nr_overcommit_hugepages = 0 > > *cat /sys/kernel/mm/transparent_hugepage/enabled* > [always] madvise never > > > Thanks all! > > > > On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu > > wrote: > > Ashwin, > > On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >> Hi, we have a cluster of ElasticSearch servers running with 31G >> heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >> >> While our old gen seems to be very stable with about 40% usage >> and no Full GCs so far, our young gen keeps growing from ~50MB to >> 850MB every few seconds. These ParNew collections are taking >> anywhere between 1-7 seconds and is causing some of our requests >> to time out. The eden space keeps filling up and then cleared >> every 30-60 seconds. There is definitely work being done by our >> JVM in terms of caching/buffering objects for a few seconds, >> writing to disk and then clearing the objects (typical >> Lucene/ElasticSearch indexing and querying workload) > > From you recent mail > > Times: user=7.89 sys=0.55, real=0.65 secs] > Times: user=7.71 sys=4.59, real=1.10 secs] > Times: user=7.46 sys=0.32, real=0.67 secs] > > Are these the 7 second collections you refer to in the paragraph > above? > If yes, the "user" time is the sum of the time spent by multiple > GC threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > > Comment below regarding "eden space keeps filling up". > >> >> These long pauses are not good for our server throughput and I >> was doing some reading. I got some conflicting reports on how >> Cassandra is configured compared to Hadoop. There are also >> references >> >> to this old ParNew+CMS bug >> which I thought >> would've been addressed in the JRE version we are using. >> Cassandra recommends >> a >> larger NewSize with just 1 for max tenuring, whereas Hadoop >> recommends a >> small NewSize. >> >> Since most of our allocations seem to be quite short lived, is >> there a way to avoid these "long" young gen pauses? >> >> Thanks in advance. Here are some details. >> * >> Heap settings:* >> java -Xmx31000m -Xms31000m >> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC >> -XX:+PrintPromotionFailure >> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 >> -XX:+PrintGCDateStamps >> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC >> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >> >> * >> Last few lines of "kill -3 pid" output:* >> Heap >> par new generation total 996800K, used 865818K >> [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000) >> eden space 886080K, 94% used [0x00007fa18e800000, >> 0x00007fa1c1a659e0, 0x00007fa1c4950000) >> from space 110720K, 25% used [0x00007fa1cb570000, >> 0x00007fa1cd091078, 0x00007fa1d2190000) >> to space 110720K, 0% used [0x00007fa1c4950000, >> 0x00007fa1c4950000, 0x00007fa1cb570000) >> concurrent mark-sweep generation total 30636480K, used 12036523K >> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >> concurrent-mark-sweep perm gen total 128856K, used 77779K >> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >> * >> * >> *Sample gc log:* >> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 2956312 bytes, 2956312 total >> - age 2: 591800 bytes, 3548112 total >> - age 3: 66216 bytes, 3614328 total >> - age 4: 270752 bytes, 3885080 total >> - age 5: 615472 bytes, 4500552 total >> - age 6: 358440 bytes, 4858992 total >> : 900635K->8173K(996800K), 0.0317340 secs] >> 1352217K->463460K(31633280K)After GC: > > In this GC eden is at 900635k before the GC and is a 8173k after. > That GC fills up is > the expected behavior. Is that what you were asking about above? > If not can you > send me an example of the "fills up" behavior. > > Jon > >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -433641480 >> Max Chunk Size: -433641480 >> Number of Blocks: 1 >> Av. Block Size: -433641480 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1227 >> Max Chunk Size: 631 >> Number of Blocks: 3 >> Av. Block Size: 409 >> Tree Height: 3 >> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >> >> >> Ashwin Jayaprakash. >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Thu Dec 18 21:10:41 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Thu, 18 Dec 2014 15:10:41 -0600 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> Message-ID: <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> The output: > cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never Tells me that transparent huge pages are enabled ?always?. I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any. charlie > On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash wrote: > > @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time. > > Jon's reply from an offline conversation: > Are these the 7 second collections you refer to in the paragraph above? > If yes, the "user" time is the sum of the time spent by multiple GC threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > > Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 ). > > The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 ). > > Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times. > > @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM > > "top" sample from a similar machine: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 2408:05 java > > "free -g": > total used free shared buffers cached > Mem: 120 119 0 0 0 95 > -/+ buffers/cache: 23 96 > Swap: 0 0 0 > > @Charlie Hugepages has already been disabled > > sudo sysctl -a | grep hugepage > vm.nr_hugepages = 0 > vm.nr_hugepages_mempolicy = 0 > vm.hugepages_treat_as_movable = 0 > vm.nr_overcommit_hugepages = 0 > > cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > > > Thanks all! > > > > On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu > wrote: > Ashwin, > > On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >> >> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload) > > From you recent mail > > Times: user=7.89 sys=0.55, real=0.65 secs] > Times: user=7.71 sys=4.59, real=1.10 secs] > Times: user=7.46 sys=0.32, real=0.67 secs] > > Are these the 7 second collections you refer to in the paragraph above? > If yes, the "user" time is the sum of the time spent by multiple GC threads. > The real time is the GC pause time that your application experiences. > In the above case the GC pauses are .65s, 1.10s and .67s. > > Comment below regarding "eden space keeps filling up". > >> >> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references to this old ParNew+CMS bug which I thought would've been addressed in the JRE version we are using. Cassandra recommends a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends a small NewSize. >> >> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses? >> >> Thanks in advance. Here are some details. >> >> Heap settings: >> java -Xmx31000m -Xms31000m >> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >> >> >> Last few lines of "kill -3 pid" output: >> Heap >> par new generation total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000) >> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000) >> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000) >> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000) >> concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >> concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >> >> Sample gc log: >> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 2956312 bytes, 2956312 total >> - age 2: 591800 bytes, 3548112 total >> - age 3: 66216 bytes, 3614328 total >> - age 4: 270752 bytes, 3885080 total >> - age 5: 615472 bytes, 4500552 total >> - age 6: 358440 bytes, 4858992 total >> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC: > > In this GC eden is at 900635k before the GC and is a 8173k after. That GC fills up is > the expected behavior. Is that what you were asking about above? If not can you > send me an example of the "fills up" behavior. > > Jon > >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -433641480 >> Max Chunk Size: -433641480 >> Number of Blocks: 1 >> Av. Block Size: -433641480 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1227 >> Max Chunk Size: 631 >> Number of Blocks: 3 >> Av. Block Size: 409 >> Tree Height: 3 >> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >> >> >> Ashwin Jayaprakash. >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashwin.jayaprakash at gmail.com Thu Dec 18 22:41:02 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Thu, 18 Dec 2014 14:41:02 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> Message-ID: *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines. *@Jon* I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off. *Item 1:* 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew Desired survivor size 56688640 bytes, new threshold 6 (max 6) - age 1: 31568024 bytes, 31568024 total - age 2: 1188576 bytes, 32756600 total - age 3: 1830920 bytes, 34587520 total - age 4: 282536 bytes, 34870056 total - age 5: 316640 bytes, 35186696 total - age 6: 249856 bytes, 35436552 total : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1206815932 Max Chunk Size: 1206815932 Number of Blocks: 1 Av. Block Size: 1206815932 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6189459 Max Chunk Size: 6188544 Number of Blocks: 3 Av. Block Size: 2063153 Tree Height: 2 , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds [2014-12-18T21:34:57,203Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]} *Item 2:* 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew Desired survivor size 56688640 bytes, new threshold 6 (max 6) - age 1: 32445776 bytes, 32445776 total - age 3: 6068000 bytes, 38513776 total - age 4: 1031528 bytes, 39545304 total - age 5: 255896 bytes, 39801200 total : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1287922158 Max Chunk Size: 1287922158 Number of Blocks: 1 Av. Block Size: 1287922158 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6205476 Max Chunk Size: 6204928 Number of Blocks: 2 Av. Block Size: 3102738 Tree Height: 2 , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds [2014-12-18T20:53:37,950Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} *Item 3:* 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: -966776244 Max Chunk Size: -966776244 Number of Blocks: 1 Av. Block Size: -966776244 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 530 Max Chunk Size: 268 Number of Blocks: 2 Av. Block Size: 265 Tree Height: 2 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew Desired survivor size 56688640 bytes, new threshold 1 (max 6) - age 1: 113315624 bytes, 113315624 total : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: -1009955715 Max Chunk Size: -1009955715 Number of Blocks: 1 Av. Block Size: -1009955715 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 530 Max Chunk Size: 268 Number of Blocks: 2 Av. Block Size: 265 Tree Height: 2 , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt wrote: > > The output: > > *cat /sys/kernel/mm/transparent_hugepage/enabled* > [always] madvise never > > > Tells me that transparent huge pages are enabled ?always?. > > I think I would change this to ?never?, even though sysctl -a may be > reporting no huge pages are currently in use. The system may trying to > coalesce pages occasionally in attempt to make huge pages available, even > though you are not currently using any. > > charlie > > > On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash < > ashwin.jayaprakash at gmail.com> wrote: > > *@Jon*, thanks for clearing that up. Yes, that was my source of > confusion. I was misinterpreting the user time with the real time. > > *Jon's reply from an offline conversation:* > >> Are these the 7 second collections you refer to in the paragraph above? >> If yes, the "user" time is the sum of the time spent by multiple GC >> threads. >> The real time is the GC pause time that your application experiences. >> In the above case the GC pauses are .65s, 1.10s and .67s. >> > > Something that added to my confusion was the tools we are using in-house. > In addition to the GC logs we have 1 tool that uses the > GarbageCollectorMXBean's getCollectionTime() method. This does not seem to > match the values I see in the GC logs ( > http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 > ). > > The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's > LastGCInfo ( > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 > and > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 > ). > > Do these methods expose the total time spent by all the parallel GC > threads for the ParNew pool or the "real" time? They do not seem to match > the GC log times. > > *@Gustav* We do not have any swapping on the machines. It could be the > disk IO experienced by the GC log writer itself, as you've suspected. The > machine has 128G of RAM > > *"top" sample from a similar machine:* > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 > 2408:05 java > > > *"free -g":* > total used free shared buffers cached > Mem: 120 119 0 0 0 95 > -/+ buffers/cache: 23 96 > Swap: 0 0 0 > > *@Charlie* Hugepages has already been disabled > > *sudo sysctl -a | grep hugepage* > vm.nr_hugepages = 0 > vm.nr_hugepages_mempolicy = 0 > vm.hugepages_treat_as_movable = 0 > vm.nr_overcommit_hugepages = 0 > > *cat /sys/kernel/mm/transparent_hugepage/enabled* > [always] madvise never > > > Thanks all! > > > > On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu > wrote: >> >> Ashwin, >> >> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >> >> Hi, we have a cluster of ElasticSearch servers running with 31G heap >> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >> >> While our old gen seems to be very stable with about 40% usage and no >> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few >> seconds. These ParNew collections are taking anywhere between 1-7 seconds >> and is causing some of our requests to time out. The eden space keeps >> filling up and then cleared every 30-60 seconds. There is definitely work >> being done by our JVM in terms of caching/buffering objects for a few >> seconds, writing to disk and then clearing the objects (typical >> Lucene/ElasticSearch indexing and querying workload) >> >> >> From you recent mail >> >> Times: user=7.89 sys=0.55, real=0.65 secs] >> Times: user=7.71 sys=4.59, real=1.10 secs] >> Times: user=7.46 sys=0.32, real=0.67 secs] >> >> Are these the 7 second collections you refer to in the paragraph above? >> If yes, the "user" time is the sum of the time spent by multiple GC >> threads. >> The real time is the GC pause time that your application experiences. >> In the above case the GC pauses are .65s, 1.10s and .67s. >> >> Comment below regarding "eden space keeps filling up". >> >> >> These long pauses are not good for our server throughput and I was doing >> some reading. I got some conflicting reports on how Cassandra is configured >> compared to Hadoop. There are also references >> >> to this old ParNew+CMS bug >> which I thought >> would've been addressed in the JRE version we are using. Cassandra >> recommends >> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends >> a small NewSize. >> >> Since most of our allocations seem to be quite short lived, is there a >> way to avoid these "long" young gen pauses? >> >> Thanks in advance. Here are some details. >> >> * Heap settings:* >> java -Xmx31000m -Xms31000m >> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 >> -XX:+PrintGCDetails >> >> >> * Last few lines of "kill -3 pid" output:* >> Heap >> par new generation total 996800K, used 865818K [0x00007fa18e800000, >> 0x00007fa1d2190000, 0x00007fa1d2190000) >> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, >> 0x00007fa1c4950000) >> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, >> 0x00007fa1d2190000) >> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, >> 0x00007fa1cb570000) >> concurrent mark-sweep generation total 30636480K, used 12036523K >> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >> concurrent-mark-sweep perm gen total 128856K, used 77779K >> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >> >> *Sample gc log:* >> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 2956312 bytes, 2956312 total >> - age 2: 591800 bytes, 3548112 total >> - age 3: 66216 bytes, 3614328 total >> - age 4: 270752 bytes, 3885080 total >> - age 5: 615472 bytes, 4500552 total >> - age 6: 358440 bytes, 4858992 total >> : 900635K->8173K(996800K), 0.0317340 secs] >> 1352217K->463460K(31633280K)After GC: >> >> >> In this GC eden is at 900635k before the GC and is a 8173k after. That >> GC fills up is >> the expected behavior. Is that what you were asking about above? If not >> can you >> send me an example of the "fills up" behavior. >> >> Jon >> >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -433641480 >> Max Chunk Size: -433641480 >> Number of Blocks: 1 >> Av. Block Size: -433641480 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1227 >> Max Chunk Size: 631 >> Number of Blocks: 3 >> Av. Block Size: 409 >> Tree Height: 3 >> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >> >> >> Ashwin Jayaprakash. >> >> >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Dec 19 22:10:24 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 19 Dec 2014 16:10:24 -0600 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> Message-ID: <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> Disabling transparent huge pages should help those GC pauses where you are seeing high sys time reported, which should also shorten their pause times. Thanks for also sharing your observation of khugepaged/pages_collapsed. charlie > On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash wrote: > > @Charlie/@Holger My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines. > > @Jon I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off. > > Item 1: > > 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 31568024 bytes, 31568024 total > - age 2: 1188576 bytes, 32756600 total > - age 3: 1830920 bytes, 34587520 total > - age 4: 282536 bytes, 34870056 total > - age 5: 316640 bytes, 35186696 total > - age 6: 249856 bytes, 35436552 total > : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1206815932 > Max Chunk Size: 1206815932 > Number of Blocks: 1 > Av. Block Size: 1206815932 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6189459 > Max Chunk Size: 6188544 > Number of Blocks: 3 > Av. Block Size: 2063153 > Tree Height: 2 > , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] > 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds > > > [2014-12-18T21:34:57,203Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]} > > > Item 2: > > 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 32445776 bytes, 32445776 total > - age 3: 6068000 bytes, 38513776 total > - age 4: 1031528 bytes, 39545304 total > - age 5: 255896 bytes, 39801200 total > : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1287922158 > Max Chunk Size: 1287922158 > Number of Blocks: 1 > Av. Block Size: 1287922158 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6205476 > Max Chunk Size: 6204928 > Number of Blocks: 2 > Av. Block Size: 3102738 > Tree Height: 2 > , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] > 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds > > > [2014-12-18T20:53:37,950Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} > > > Item 3: > > 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -966776244 > Max Chunk Size: -966776244 > Number of Blocks: 1 > Av. Block Size: -966776244 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 530 > Max Chunk Size: 268 > Number of Blocks: 2 > Av. Block Size: 265 > Tree Height: 2 > 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew > Desired survivor size 56688640 bytes, new threshold 1 (max 6) > - age 1: 113315624 bytes, 113315624 total > : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -1009955715 > Max Chunk Size: -1009955715 > Number of Blocks: 1 > Av. Block Size: -1009955715 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 530 > Max Chunk Size: 268 > Number of Blocks: 2 > Av. Block Size: 265 > Tree Height: 2 > , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] > 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds > > > [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} > > > > > On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt > wrote: > The output: >> cat /sys/kernel/mm/transparent_hugepage/enabled >> [always] madvise never > > Tells me that transparent huge pages are enabled ?always?. > > I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any. > > charlie > > >> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash > wrote: >> >> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time. >> >> Jon's reply from an offline conversation: >> Are these the 7 second collections you refer to in the paragraph above? >> If yes, the "user" time is the sum of the time spent by multiple GC threads. >> The real time is the GC pause time that your application experiences. >> In the above case the GC pauses are .65s, 1.10s and .67s. >> >> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 ). >> >> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 ). >> >> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times. >> >> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM >> >> "top" sample from a similar machine: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 2408:05 java >> >> "free -g": >> total used free shared buffers cached >> Mem: 120 119 0 0 0 95 >> -/+ buffers/cache: 23 96 >> Swap: 0 0 0 >> >> @Charlie Hugepages has already been disabled >> >> sudo sysctl -a | grep hugepage >> vm.nr_hugepages = 0 >> vm.nr_hugepages_mempolicy = 0 >> vm.hugepages_treat_as_movable = 0 >> vm.nr_overcommit_hugepages = 0 >> >> cat /sys/kernel/mm/transparent_hugepage/enabled >> [always] madvise never >> >> >> Thanks all! >> >> >> >> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu > wrote: >> Ashwin, >> >> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >>> >>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload) >> >> From you recent mail >> >> Times: user=7.89 sys=0.55, real=0.65 secs] >> Times: user=7.71 sys=4.59, real=1.10 secs] >> Times: user=7.46 sys=0.32, real=0.67 secs] >> >> Are these the 7 second collections you refer to in the paragraph above? >> If yes, the "user" time is the sum of the time spent by multiple GC threads. >> The real time is the GC pause time that your application experiences. >> In the above case the GC pauses are .65s, 1.10s and .67s. >> >> Comment below regarding "eden space keeps filling up". >> >>> >>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references to this old ParNew+CMS bug which I thought would've been addressed in the JRE version we are using. Cassandra recommends a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends a small NewSize. >>> >>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses? >>> >>> Thanks in advance. Here are some details. >>> >>> Heap settings: >>> java -Xmx31000m -Xms31000m >>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 >>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >>> >>> >>> Last few lines of "kill -3 pid" output: >>> Heap >>> par new generation total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000) >>> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000) >>> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000) >>> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000) >>> concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >>> concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >>> >>> Sample gc log: >>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>> - age 1: 2956312 bytes, 2956312 total >>> - age 2: 591800 bytes, 3548112 total >>> - age 3: 66216 bytes, 3614328 total >>> - age 4: 270752 bytes, 3885080 total >>> - age 5: 615472 bytes, 4500552 total >>> - age 6: 358440 bytes, 4858992 total >>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC: >> >> In this GC eden is at 900635k before the GC and is a 8173k after. That GC fills up is >> the expected behavior. Is that what you were asking about above? If not can you >> send me an example of the "fills up" behavior. >> >> Jon >> >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: -433641480 >>> Max Chunk Size: -433641480 >>> Number of Blocks: 1 >>> Av. Block Size: -433641480 >>> Tree Height: 1 >>> After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 1227 >>> Max Chunk Size: 631 >>> Number of Blocks: 3 >>> Av. Block Size: 409 >>> Tree Height: 3 >>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >>> >>> >>> Ashwin Jayaprakash. >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From java at elyograg.org Sat Dec 20 01:28:55 2014 From: java at elyograg.org (Shawn Heisey) Date: Fri, 19 Dec 2014 18:28:55 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <1418849513.3293.3.camel@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> Message-ID: <5494D0D7.2010606@elyograg.org> On 12/17/2014 1:51 PM, Thomas Schatzl wrote: >> In both cases, I used -Xms4096M and -Xmx6144M. These are the GC logging >> options: >> >> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps >> -XX:+PrintGCDetails" >> >> Here's the GC logs that I already have: >> >> https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0 >> https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0 >> > > please also add -XX:+PrintReferenceGC, and definitely use -XX: > +ParallelRefProcEnabled. > > GC is spending a significant amount of the time in soft/weak reference > processing. -XX:+ParallelRefProcEnabled will help, but there will be > spikes still. I saw that GC sometimes spends 1000ms just processing > those references; using 8 threads this should get better. > > That alone will likely make it hard reaching a 100ms pause time goal > (1000ms/8 = 125ms...). > > CMS has the same problems, and while on average it has ~215ms pauses, > there seem to be a lot that are a lot longer too. Reference processing > also takes very long, even with -XX:+ParallelRefProcEnabled. > > I am not sure about the cause for the full gc's: either the pause time > prediction in G1 in that version is too bad and it tries to use a way > too large young gen, or there are a few very large objects around. > > Depending on the log output and the impact of the other options we might > want to cap the maximum young gen size. > >> I believe that Lucene does use a lot of references. > > I saw that. Must be millions. -XX:+PrintReferenceGC should show that > (also in CMS). I still did not get the list message, but I figured out why. The list subscription has an option "Avoid duplicate copies of messages" that I just had to turn off. I prefer to reply to messages from the list because I know for sure that all the right headers are included. I would not be surprised if there are millions of references. My whole index is over 98 million documents and half of those documents are present in shards on each server, taking up about 60GB of disk space per server. I already have ParallelRefProcEnabled and I have just added PrintReferenceGC. For reference, here are my options for CMS: JVM_OPTS=" \ -XX:NewRatio=3 \ -XX:SurvivorRatio=4 \ -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:+UseConcMarkSweepGC \ -XX:+CMSScavengeBeforeRemark \ -XX:PretenureSizeThreshold=64m \ -XX:CMSFullGCsBeforeCompaction=1 \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=70 \ -XX:CMSTriggerPermRatio=80 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages \ -XX:+AggressiveOpts \ " Which of these options will apply to G1, and are any of them worthwhile to include? I haven't got any tuning options at all for G1, and I'm looking for suggestions. This is my current G1 option list: JVM_OPTS=" \ -XX:+UseG1GC \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " Based on some recent list activity unrelated to this discussion, I also opted to disable transparent huge pages on the Solr servers. I haven't noticed any real difference in the server resource graphs (CPU, load, etc). I've started an internal discussion about Java 8 to see how receptive everyone will be to an upgrade. Thanks, Shawn From thomas.schatzl at oracle.com Sun Dec 21 14:01:17 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sun, 21 Dec 2014 15:01:17 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <5494D0D7.2010606@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> Message-ID: <1419170477.6868.1.camel@oracle.com> Hi Shawn, On Fri, 2014-12-19 at 18:28 -0700, Shawn Heisey wrote: > On 12/17/2014 1:51 PM, Thomas Schatzl wrote: > >> In both cases, I used -Xms4096M and -Xmx6144M. These are the GC logging > >> options: > >> > >> I believe that Lucene does use a lot of references. > > > > I saw that. Must be millions. -XX:+PrintReferenceGC should show that > > (also in CMS). > > I would not be surprised if there are millions of references. My whole > index is over 98 million documents and half of those documents are > present in shards on each server, taking up about 60GB of disk space per > server. > > I already have ParallelRefProcEnabled and I have just added > PrintReferenceGC. > > For reference, here are my options for CMS: > > JVM_OPTS=" \ > -XX:NewRatio=3 \ > -XX:SurvivorRatio=4 \ > -XX:TargetSurvivorRatio=90 \ > -XX:MaxTenuringThreshold=8 \ >-XX:+UseConcMarkSweepGC \ > -XX:+CMSScavengeBeforeRemark \ > -XX:PretenureSizeThreshold=64m \ > -XX:CMSFullGCsBeforeCompaction=1 \ > -XX:+UseCMSInitiatingOccupancyOnly \ > -XX:CMSInitiatingOccupancyFraction=70 \ > -XX:CMSTriggerPermRatio=80 \ > -XX:CMSMaxAbortablePrecleanTime=6000 \ > -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " > > Which of these options will apply to G1, and are any of them worthwhile > to include? Only ParallelRefProcEnabled will be useful. Potentially UseLargePages too, but you mentioned you do not have any of them configured in the OS anyway. That's why the change in the Transparent Huge Pages settings did not have any impact either. The others are either CMS specific (from UseConcMarkSweepGC to CMSParallelRemarkEnabled) or would limit the ability of G1 to dynamically adapt the young generation (NewRatio to MaxTenuringThreshold). Afaik AggressiveOpts does not actually do a lot any more for some time but I do not think it hurts. > I haven't got any tuning options at all for G1, and I'm > looking for suggestions. This is my current G1 option list: > > JVM_OPTS=" \ > -XX:+UseG1GC \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ Add -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX: +PrintAdaptiveSizePolicy (the last one prints useful information about some decisions, not really needed in regular operation and could be removed later) add -XX:+ParallelRefProcEnabled to get reference processing time down. Use the same settings for heap size as in CMS. Add -XX:MaxGCPauseMillis= to get a time goal G1 will aim for. As mentioned above, it is likely that G1 will not keep up in many instances with 100ms due to the many java.lang.Ref instances. > Based on some recent list activity unrelated to this discussion, I also > opted to disable transparent huge pages on the Solr servers. I haven't > noticed any real difference in the server resource graphs (CPU, load, etc). > > I've started an internal discussion about Java 8 to see how receptive > everyone will be to an upgrade. Another potentially interesting thing I forgot about G1 is that in 8u40 G1 can free memory much more freely and dynamically than before. Not sure you need that, but in your recent sample settings you showed an Xms value that has been smaller than Xmx. Thanks, Thomas From ashwin.jayaprakash at gmail.com Mon Dec 22 17:49:50 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Mon, 22 Dec 2014 09:49:50 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> Message-ID: All, I'm happy to report that disabling THP made a big difference. We do not see multi-second minor GCs in our cluster anymore. Thanks for your help. On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt wrote: > Disabling transparent huge pages should help those GC pauses where you are > seeing high sys time reported, which should also shorten their pause times. > > Thanks for also sharing your observation of khugepaged/pages_collapsed. > > charlie > > On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash < > ashwin.jayaprakash at gmail.com> wrote: > > *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the > "never" and thought it was already done. In fact "cat > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed > 904 and after an hour now, it shows 6845 on one of our machines. > > *@Jon* I dug through some of our ElasticSearch/application logs again and > tried to correlate them with the GC logs. The collection time does seem to > match the GC log's "real" time. However the collected sizes don't seem to > match, which is what threw me off. > > *Item 1:* > > 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 31568024 bytes, 31568024 total > - age 2: 1188576 bytes, 32756600 total > - age 3: 1830920 bytes, 34587520 total > - age 4: 282536 bytes, 34870056 total > - age 5: 316640 bytes, 35186696 total > - age 6: 249856 bytes, 35436552 total > : 931773K->49827K(996800K), 1.3622770 secs] > 22132844K->21256042K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1206815932 > Max Chunk Size: 1206815932 > Number of Blocks: 1 > Av. Block Size: 1206815932 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6189459 > Max Chunk Size: 6188544 > Number of Blocks: 3 > Av. Block Size: 2063153 > Tree Height: 2 > , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] > 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application > threads were stopped: 1.3638970 seconds > > > [2014-12-18T21:34:57,203Z] [WARN ] > [elasticsearch[server00001][scheduler][T#1]] > [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration > [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory > [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] > [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] > [20.2gb]->[20.2gb]/[29.2gb]} > > > *Item 2:* > > 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew > Desired survivor size 56688640 bytes, new threshold 6 (max 6) > - age 1: 32445776 bytes, 32445776 total > - age 3: 6068000 bytes, 38513776 total > - age 4: 1031528 bytes, 39545304 total > - age 5: 255896 bytes, 39801200 total > : 939702K->53536K(996800K), 2.9352940 secs] > 21501296K->20625925K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1287922158 > Max Chunk Size: 1287922158 > Number of Blocks: 1 > Av. Block Size: 1287922158 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6205476 > Max Chunk Size: 6204928 > Number of Blocks: 2 > Av. Block Size: 3102738 > Tree Height: 2 > , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] > 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application > threads were stopped: 2.9367640 seconds > > > [2014-12-18T20:53:37,950Z] [WARN ] > [elasticsearch[server00001][scheduler][T#1]] > [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration > [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory > [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] > [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] > [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} > > > *Item 3:* > > 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -966776244 > Max Chunk Size: -966776244 > Number of Blocks: 1 > Av. Block Size: -966776244 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 530 > Max Chunk Size: 268 > Number of Blocks: 2 > Av. Block Size: 265 > Tree Height: 2 > 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew > Desired survivor size 56688640 bytes, new threshold 1 (max 6) > - age 1: 113315624 bytes, 113315624 total > : 996800K->110720K(996800K), 7.3511710 secs] > 5609422K->5065102K(31633280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: -1009955715 > Max Chunk Size: -1009955715 > Number of Blocks: 1 > Av. Block Size: -1009955715 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 530 > Max Chunk Size: 268 > Number of Blocks: 2 > Av. Block Size: 265 > Tree Height: 2 > , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] > 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application > threads were stopped: 7.3525250 seconds > > > [2014-12-17T14:42:17,944Z] [WARN ] > [elasticsearch[prdaes04data03][scheduler][T#1]] > [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration > [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory > [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] > [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] > [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} > > > > > On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt > wrote: >> >> The output: >> >> *cat /sys/kernel/mm/transparent_hugepage/enabled* >> [always] madvise never >> >> >> Tells me that transparent huge pages are enabled ?always?. >> >> I think I would change this to ?never?, even though sysctl -a may be >> reporting no huge pages are currently in use. The system may trying to >> coalesce pages occasionally in attempt to make huge pages available, even >> though you are not currently using any. >> >> charlie >> >> >> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash < >> ashwin.jayaprakash at gmail.com> wrote: >> >> *@Jon*, thanks for clearing that up. Yes, that was my source of >> confusion. I was misinterpreting the user time with the real time. >> >> *Jon's reply from an offline conversation:* >> >>> Are these the 7 second collections you refer to in the paragraph above? >>> If yes, the "user" time is the sum of the time spent by multiple GC >>> threads. >>> The real time is the GC pause time that your application experiences. >>> In the above case the GC pauses are .65s, 1.10s and .67s. >>> >> >> Something that added to my confusion was the tools we are using in-house. >> In addition to the GC logs we have 1 tool that uses the >> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to >> match the values I see in the GC logs ( >> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 >> ). >> >> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's >> LastGCInfo ( >> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 >> and >> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 >> ). >> >> Do these methods expose the total time spent by all the parallel GC >> threads for the ParNew pool or the "real" time? They do not seem to match >> the GC log times. >> >> *@Gustav* We do not have any swapping on the machines. It could be the >> disk IO experienced by the GC log writer itself, as you've suspected. The >> machine has 128G of RAM >> >> *"top" sample from a similar machine:* >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 >> 2408:05 java >> >> >> *"free -g":* >> total used free shared buffers cached >> Mem: 120 119 0 0 0 95 >> -/+ buffers/cache: 23 96 >> Swap: 0 0 0 >> >> *@Charlie* Hugepages has already been disabled >> >> *sudo sysctl -a | grep hugepage* >> vm.nr_hugepages = 0 >> vm.nr_hugepages_mempolicy = 0 >> vm.hugepages_treat_as_movable = 0 >> vm.nr_overcommit_hugepages = 0 >> >> *cat /sys/kernel/mm/transparent_hugepage/enabled* >> [always] madvise never >> >> >> Thanks all! >> >> >> >> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu >> wrote: >>> >>> Ashwin, >>> >>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >>> >>> Hi, we have a cluster of ElasticSearch servers running with 31G heap >>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >>> >>> While our old gen seems to be very stable with about 40% usage and no >>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few >>> seconds. These ParNew collections are taking anywhere between 1-7 seconds >>> and is causing some of our requests to time out. The eden space keeps >>> filling up and then cleared every 30-60 seconds. There is definitely work >>> being done by our JVM in terms of caching/buffering objects for a few >>> seconds, writing to disk and then clearing the objects (typical >>> Lucene/ElasticSearch indexing and querying workload) >>> >>> >>> From you recent mail >>> >>> Times: user=7.89 sys=0.55, real=0.65 secs] >>> Times: user=7.71 sys=4.59, real=1.10 secs] >>> Times: user=7.46 sys=0.32, real=0.67 secs] >>> >>> Are these the 7 second collections you refer to in the paragraph above? >>> If yes, the "user" time is the sum of the time spent by multiple GC >>> threads. >>> The real time is the GC pause time that your application experiences. >>> In the above case the GC pauses are .65s, 1.10s and .67s. >>> >>> Comment below regarding "eden space keeps filling up". >>> >>> >>> These long pauses are not good for our server throughput and I was >>> doing some reading. I got some conflicting reports on how Cassandra is >>> configured compared to Hadoop. There are also references >>> >>> to this old ParNew+CMS bug >>> which I thought >>> would've been addressed in the JRE version we are using. Cassandra >>> recommends >>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends >>> a small NewSize. >>> >>> Since most of our allocations seem to be quite short lived, is there a >>> way to avoid these "long" young gen pauses? >>> >>> Thanks in advance. Here are some details. >>> >>> * Heap settings:* >>> java -Xmx31000m -Xms31000m >>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >>> -XX:CMSInitiatingOccupancyFraction=70 >>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC >>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >>> >>> >>> * Last few lines of "kill -3 pid" output:* >>> Heap >>> par new generation total 996800K, used 865818K [0x00007fa18e800000, >>> 0x00007fa1d2190000, 0x00007fa1d2190000) >>> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, >>> 0x00007fa1c4950000) >>> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, >>> 0x00007fa1d2190000) >>> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, >>> 0x00007fa1cb570000) >>> concurrent mark-sweep generation total 30636480K, used 12036523K >>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >>> concurrent-mark-sweep perm gen total 128856K, used 77779K >>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >>> >>> *Sample gc log:* >>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>> - age 1: 2956312 bytes, 2956312 total >>> - age 2: 591800 bytes, 3548112 total >>> - age 3: 66216 bytes, 3614328 total >>> - age 4: 270752 bytes, 3885080 total >>> - age 5: 615472 bytes, 4500552 total >>> - age 6: 358440 bytes, 4858992 total >>> : 900635K->8173K(996800K), 0.0317340 secs] >>> 1352217K->463460K(31633280K)After GC: >>> >>> >>> In this GC eden is at 900635k before the GC and is a 8173k after. That >>> GC fills up is >>> the expected behavior. Is that what you were asking about above? If >>> not can you >>> send me an example of the "fills up" behavior. >>> >>> Jon >>> >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: -433641480 >>> Max Chunk Size: -433641480 >>> Number of Blocks: 1 >>> Av. Block Size: -433641480 >>> Tree Height: 1 >>> After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 1227 >>> Max Chunk Size: 631 >>> Number of Blocks: 3 >>> Av. Block Size: 409 >>> Tree Height: 3 >>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >>> >>> >>> Ashwin Jayaprakash. >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Mon Dec 22 17:52:02 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Mon, 22 Dec 2014 11:52:02 -0600 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> Message-ID: <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com> Thanks for reporting back, and great to hear disabling THP has solved your multi-second minor GCs issue. :-) charlie > On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash wrote: > > All, I'm happy to report that disabling THP made a big difference. We do not see multi-second minor GCs in our cluster anymore. > > Thanks for your help. > > On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt > wrote: > Disabling transparent huge pages should help those GC pauses where you are seeing high sys time reported, which should also shorten their pause times. > > Thanks for also sharing your observation of khugepaged/pages_collapsed. > > charlie > >> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash > wrote: >> >> @Charlie/@Holger My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines. >> >> @Jon I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off. >> >> Item 1: >> >> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 31568024 bytes, 31568024 total >> - age 2: 1188576 bytes, 32756600 total >> - age 3: 1830920 bytes, 34587520 total >> - age 4: 282536 bytes, 34870056 total >> - age 5: 316640 bytes, 35186696 total >> - age 6: 249856 bytes, 35436552 total >> : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1206815932 >> Max Chunk Size: 1206815932 >> Number of Blocks: 1 >> Av. Block Size: 1206815932 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 6189459 >> Max Chunk Size: 6188544 >> Number of Blocks: 3 >> Av. Block Size: 2063153 >> Tree Height: 2 >> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] >> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds >> >> >> [2014-12-18T21:34:57,203Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]} >> >> >> Item 2: >> >> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 32445776 bytes, 32445776 total >> - age 3: 6068000 bytes, 38513776 total >> - age 4: 1031528 bytes, 39545304 total >> - age 5: 255896 bytes, 39801200 total >> : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1287922158 >> Max Chunk Size: 1287922158 >> Number of Blocks: 1 >> Av. Block Size: 1287922158 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 6205476 >> Max Chunk Size: 6204928 >> Number of Blocks: 2 >> Av. Block Size: 3102738 >> Tree Height: 2 >> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] >> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds >> >> >> [2014-12-18T20:53:37,950Z] [WARN ] [elasticsearch[server00001][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} >> >> >> Item 3: >> >> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -966776244 >> Max Chunk Size: -966776244 >> Number of Blocks: 1 >> Av. Block Size: -966776244 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 530 >> Max Chunk Size: 268 >> Number of Blocks: 2 >> Av. Block Size: 265 >> Tree Height: 2 >> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew >> Desired survivor size 56688640 bytes, new threshold 1 (max 6) >> - age 1: 113315624 bytes, 113315624 total >> : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -1009955715 >> Max Chunk Size: -1009955715 >> Number of Blocks: 1 >> Av. Block Size: -1009955715 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 530 >> Max Chunk Size: 268 >> Number of Blocks: 2 >> Av. Block Size: 265 >> Tree Height: 2 >> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] >> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds >> >> >> [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} >> >> >> >> >> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt > wrote: >> The output: >>> cat /sys/kernel/mm/transparent_hugepage/enabled >>> [always] madvise never >> >> Tells me that transparent huge pages are enabled ?always?. >> >> I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any. >> >> charlie >> >> >>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash > wrote: >>> >>> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time. >>> >>> Jon's reply from an offline conversation: >>> Are these the 7 second collections you refer to in the paragraph above? >>> If yes, the "user" time is the sum of the time spent by multiple GC threads. >>> The real time is the GC pause time that your application experiences. >>> In the above case the GC pauses are .65s, 1.10s and .67s. >>> >>> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 ). >>> >>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 ). >>> >>> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times. >>> >>> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM >>> >>> "top" sample from a similar machine: >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 2408:05 java >>> >>> "free -g": >>> total used free shared buffers cached >>> Mem: 120 119 0 0 0 95 >>> -/+ buffers/cache: 23 96 >>> Swap: 0 0 0 >>> >>> @Charlie Hugepages has already been disabled >>> >>> sudo sysctl -a | grep hugepage >>> vm.nr_hugepages = 0 >>> vm.nr_hugepages_mempolicy = 0 >>> vm.hugepages_treat_as_movable = 0 >>> vm.nr_overcommit_hugepages = 0 >>> >>> cat /sys/kernel/mm/transparent_hugepage/enabled >>> [always] madvise never >>> >>> >>> Thanks all! >>> >>> >>> >>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu > wrote: >>> Ashwin, >>> >>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >>>> >>>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload) >>> >>> From you recent mail >>> >>> Times: user=7.89 sys=0.55, real=0.65 secs] >>> Times: user=7.71 sys=4.59, real=1.10 secs] >>> Times: user=7.46 sys=0.32, real=0.67 secs] >>> >>> Are these the 7 second collections you refer to in the paragraph above? >>> If yes, the "user" time is the sum of the time spent by multiple GC threads. >>> The real time is the GC pause time that your application experiences. >>> In the above case the GC pauses are .65s, 1.10s and .67s. >>> >>> Comment below regarding "eden space keeps filling up". >>> >>>> >>>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references to this old ParNew+CMS bug which I thought would've been addressed in the JRE version we are using. Cassandra recommends a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends a small NewSize. >>>> >>>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses? >>>> >>>> Thanks in advance. Here are some details. >>>> >>>> Heap settings: >>>> java -Xmx31000m -Xms31000m >>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 >>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >>>> >>>> >>>> Last few lines of "kill -3 pid" output: >>>> Heap >>>> par new generation total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000) >>>> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000) >>>> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000) >>>> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000) >>>> concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >>>> concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >>>> >>>> Sample gc log: >>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>>> - age 1: 2956312 bytes, 2956312 total >>>> - age 2: 591800 bytes, 3548112 total >>>> - age 3: 66216 bytes, 3614328 total >>>> - age 4: 270752 bytes, 3885080 total >>>> - age 5: 615472 bytes, 4500552 total >>>> - age 6: 358440 bytes, 4858992 total >>>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC: >>> >>> In this GC eden is at 900635k before the GC and is a 8173k after. That GC fills up is >>> the expected behavior. Is that what you were asking about above? If not can you >>> send me an example of the "fills up" behavior. >>> >>> Jon >>> >>>> Statistics for BinaryTreeDictionary: >>>> ------------------------------------ >>>> Total Free Space: -433641480 >>>> Max Chunk Size: -433641480 >>>> Number of Blocks: 1 >>>> Av. Block Size: -433641480 >>>> Tree Height: 1 >>>> After GC: >>>> Statistics for BinaryTreeDictionary: >>>> ------------------------------------ >>>> Total Free Space: 1227 >>>> Max Chunk Size: 631 >>>> Number of Blocks: 3 >>>> Av. Block Size: 409 >>>> Tree Height: 3 >>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >>>> >>>> >>>> Ashwin Jayaprakash. >>>> >>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustav.r.akesson at gmail.com Mon Dec 22 20:45:20 2014 From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=) Date: Mon, 22 Dec 2014 21:45:20 +0100 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com> References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com> Message-ID: Hi, Indeed thank you for reporting back on this. TIL about THP, so to say... Best Regards, Gustav?kesson Den 22 dec 2014 18:52 skrev "charlie hunt" : > Thanks for reporting back, and great to hear disabling THP has solved your > multi-second minor GCs issue. :-) > > charlie > > On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash < > ashwin.jayaprakash at gmail.com> wrote: > > All, I'm happy to report that disabling THP made a big difference. We do > not see multi-second minor GCs in our cluster anymore. > > Thanks for your help. > > On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt > wrote: > >> Disabling transparent huge pages should help those GC pauses where you >> are seeing high sys time reported, which should also shorten their pause >> times. >> >> Thanks for also sharing your observation of khugepaged/pages_collapsed. >> >> charlie >> >> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash < >> ashwin.jayaprakash at gmail.com> wrote: >> >> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the >> "never" and thought it was already done. In fact "cat >> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed >> 904 and after an hour now, it shows 6845 on one of our machines. >> >> *@Jon* I dug through some of our ElasticSearch/application logs again >> and tried to correlate them with the GC logs. The collection time does seem >> to match the GC log's "real" time. However the collected sizes don't seem >> to match, which is what threw me off. >> >> *Item 1:* >> >> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 31568024 bytes, 31568024 total >> - age 2: 1188576 bytes, 32756600 total >> - age 3: 1830920 bytes, 34587520 total >> - age 4: 282536 bytes, 34870056 total >> - age 5: 316640 bytes, 35186696 total >> - age 6: 249856 bytes, 35436552 total >> : 931773K->49827K(996800K), 1.3622770 secs] >> 22132844K->21256042K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1206815932 >> Max Chunk Size: 1206815932 >> Number of Blocks: 1 >> Av. Block Size: 1206815932 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 6189459 >> Max Chunk Size: 6188544 >> Number of Blocks: 3 >> Av. Block Size: 2063153 >> Tree Height: 2 >> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] >> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which >> application threads were stopped: 1.3638970 seconds >> >> >> [2014-12-18T21:34:57,203Z] [WARN ] >> [elasticsearch[server00001][scheduler][T#1]] >> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration >> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory >> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] >> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] >> [20.2gb]->[20.2gb]/[29.2gb]} >> >> >> *Item 2:* >> >> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew >> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >> - age 1: 32445776 bytes, 32445776 total >> - age 3: 6068000 bytes, 38513776 total >> - age 4: 1031528 bytes, 39545304 total >> - age 5: 255896 bytes, 39801200 total >> : 939702K->53536K(996800K), 2.9352940 secs] >> 21501296K->20625925K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1287922158 >> Max Chunk Size: 1287922158 >> Number of Blocks: 1 >> Av. Block Size: 1287922158 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 6205476 >> Max Chunk Size: 6204928 >> Number of Blocks: 2 >> Av. Block Size: 3102738 >> Tree Height: 2 >> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] >> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which >> application threads were stopped: 2.9367640 seconds >> >> >> [2014-12-18T20:53:37,950Z] [WARN ] >> [elasticsearch[server00001][scheduler][T#1]] >> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration >> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory >> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] >> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] >> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} >> >> >> *Item 3:* >> >> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -966776244 >> Max Chunk Size: -966776244 >> Number of Blocks: 1 >> Av. Block Size: -966776244 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 530 >> Max Chunk Size: 268 >> Number of Blocks: 2 >> Av. Block Size: 265 >> Tree Height: 2 >> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew >> Desired survivor size 56688640 bytes, new threshold 1 (max 6) >> - age 1: 113315624 bytes, 113315624 total >> : 996800K->110720K(996800K), 7.3511710 secs] >> 5609422K->5065102K(31633280K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: -1009955715 >> Max Chunk Size: -1009955715 >> Number of Blocks: 1 >> Av. Block Size: -1009955715 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 530 >> Max Chunk Size: 268 >> Number of Blocks: 2 >> Av. Block Size: 265 >> Tree Height: 2 >> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] >> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application >> threads were stopped: 7.3525250 seconds >> >> >> [2014-12-17T14:42:17,944Z] [WARN ] >> [elasticsearch[prdaes04data03][scheduler][T#1]] >> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration >> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory >> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] >> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] >> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} >> >> >> >> >> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt >> wrote: >>> >>> The output: >>> >>> *cat /sys/kernel/mm/transparent_hugepage/enabled* >>> [always] madvise never >>> >>> >>> Tells me that transparent huge pages are enabled ?always?. >>> >>> I think I would change this to ?never?, even though sysctl -a may be >>> reporting no huge pages are currently in use. The system may trying to >>> coalesce pages occasionally in attempt to make huge pages available, even >>> though you are not currently using any. >>> >>> charlie >>> >>> >>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash < >>> ashwin.jayaprakash at gmail.com> wrote: >>> >>> *@Jon*, thanks for clearing that up. Yes, that was my source of >>> confusion. I was misinterpreting the user time with the real time. >>> >>> *Jon's reply from an offline conversation:* >>> >>>> Are these the 7 second collections you refer to in the paragraph above? >>>> If yes, the "user" time is the sum of the time spent by multiple GC >>>> threads. >>>> The real time is the GC pause time that your application experiences. >>>> In the above case the GC pauses are .65s, 1.10s and .67s. >>>> >>> >>> Something that added to my confusion was the tools we are using >>> in-house. In addition to the GC logs we have 1 tool that uses the >>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to >>> match the values I see in the GC logs ( >>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 >>> ). >>> >>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's >>> LastGCInfo ( >>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 >>> and >>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 >>> ). >>> >>> Do these methods expose the total time spent by all the parallel GC >>> threads for the ParNew pool or the "real" time? They do not seem to match >>> the GC log times. >>> >>> *@Gustav* We do not have any swapping on the machines. It could be the >>> disk IO experienced by the GC log writer itself, as you've suspected. The >>> machine has 128G of RAM >>> >>> *"top" sample from a similar machine:* >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 >>> 2408:05 java >>> >>> >>> *"free -g":* >>> total used free shared buffers cached >>> Mem: 120 119 0 0 0 95 >>> -/+ buffers/cache: 23 96 >>> Swap: 0 0 0 >>> >>> *@Charlie* Hugepages has already been disabled >>> >>> *sudo sysctl -a | grep hugepage* >>> vm.nr_hugepages = 0 >>> vm.nr_hugepages_mempolicy = 0 >>> vm.hugepages_treat_as_movable = 0 >>> vm.nr_overcommit_hugepages = 0 >>> >>> *cat /sys/kernel/mm/transparent_hugepage/enabled* >>> [always] madvise never >>> >>> >>> Thanks all! >>> >>> >>> >>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu >> > wrote: >>>> >>>> Ashwin, >>>> >>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >>>> >>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap >>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >>>> >>>> While our old gen seems to be very stable with about 40% usage and no >>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few >>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds >>>> and is causing some of our requests to time out. The eden space keeps >>>> filling up and then cleared every 30-60 seconds. There is definitely work >>>> being done by our JVM in terms of caching/buffering objects for a few >>>> seconds, writing to disk and then clearing the objects (typical >>>> Lucene/ElasticSearch indexing and querying workload) >>>> >>>> >>>> From you recent mail >>>> >>>> Times: user=7.89 sys=0.55, real=0.65 secs] >>>> Times: user=7.71 sys=4.59, real=1.10 secs] >>>> Times: user=7.46 sys=0.32, real=0.67 secs] >>>> >>>> Are these the 7 second collections you refer to in the paragraph above? >>>> If yes, the "user" time is the sum of the time spent by multiple GC >>>> threads. >>>> The real time is the GC pause time that your application experiences. >>>> In the above case the GC pauses are .65s, 1.10s and .67s. >>>> >>>> Comment below regarding "eden space keeps filling up". >>>> >>>> >>>> These long pauses are not good for our server throughput and I was >>>> doing some reading. I got some conflicting reports on how Cassandra is >>>> configured compared to Hadoop. There are also references >>>> >>>> to this old ParNew+CMS bug >>>> which I thought >>>> would've been addressed in the JRE version we are using. Cassandra >>>> recommends >>>> a larger NewSize with just 1 for max tenuring, whereas Hadoop >>>> recommends a small >>>> NewSize. >>>> >>>> Since most of our allocations seem to be quite short lived, is there >>>> a way to avoid these "long" young gen pauses? >>>> >>>> Thanks in advance. Here are some details. >>>> >>>> * Heap settings:* >>>> java -Xmx31000m -Xms31000m >>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >>>> -XX:CMSInitiatingOccupancyFraction=70 >>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC >>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >>>> >>>> >>>> * Last few lines of "kill -3 pid" output:* >>>> Heap >>>> par new generation total 996800K, used 865818K [0x00007fa18e800000, >>>> 0x00007fa1d2190000, 0x00007fa1d2190000) >>>> eden space 886080K, 94% used [0x00007fa18e800000, >>>> 0x00007fa1c1a659e0, 0x00007fa1c4950000) >>>> from space 110720K, 25% used [0x00007fa1cb570000, >>>> 0x00007fa1cd091078, 0x00007fa1d2190000) >>>> to space 110720K, 0% used [0x00007fa1c4950000, >>>> 0x00007fa1c4950000, 0x00007fa1cb570000) >>>> concurrent mark-sweep generation total 30636480K, used 12036523K >>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >>>> concurrent-mark-sweep perm gen total 128856K, used 77779K >>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >>>> >>>> *Sample gc log:* >>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>>> - age 1: 2956312 bytes, 2956312 total >>>> - age 2: 591800 bytes, 3548112 total >>>> - age 3: 66216 bytes, 3614328 total >>>> - age 4: 270752 bytes, 3885080 total >>>> - age 5: 615472 bytes, 4500552 total >>>> - age 6: 358440 bytes, 4858992 total >>>> : 900635K->8173K(996800K), 0.0317340 secs] >>>> 1352217K->463460K(31633280K)After GC: >>>> >>>> >>>> In this GC eden is at 900635k before the GC and is a 8173k after. That >>>> GC fills up is >>>> the expected behavior. Is that what you were asking about above? If >>>> not can you >>>> send me an example of the "fills up" behavior. >>>> >>>> Jon >>>> >>>> Statistics for BinaryTreeDictionary: >>>> ------------------------------------ >>>> Total Free Space: -433641480 >>>> Max Chunk Size: -433641480 >>>> Number of Blocks: 1 >>>> Av. Block Size: -433641480 >>>> Tree Height: 1 >>>> After GC: >>>> Statistics for BinaryTreeDictionary: >>>> ------------------------------------ >>>> Total Free Space: 1227 >>>> Max Chunk Size: 631 >>>> Number of Blocks: 3 >>>> Av. Block Size: 409 >>>> Tree Height: 3 >>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >>>> >>>> >>>> Ashwin Jayaprakash. >>>> >>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>>> >>>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashwin.jayaprakash at gmail.com Mon Dec 22 21:32:06 2014 From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash) Date: Mon, 22 Dec 2014 13:32:06 -0800 Subject: Multi-second ParNew collections but stable CMS In-Reply-To: References: <5492272A.2040304@oracle.com> <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com> <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com> <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com> Message-ID: Just to summarize, we did disable THP and noticed minor GC times going down considerably. What still puzzles me is that the OS still reports huge pages in use but only a little bit - some food for thought: [user1 at server0001 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] [user1 at server0001 ~]# cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed 0 [user1 at server0001 ~]# grep AnonHugePages /proc/meminfo AnonHugePages: 331776 kB [user1 at server0001 ~]# egrep 'trans|thp' /proc/vmstat nr_anon_transparent_hugepages 162 thp_fault_alloc 209 thp_fault_fallback 0 thp_collapse_alloc 0 thp_collapse_alloc_failed 0 thp_split 11 (Huge page use per process - https://access.redhat.com/solutions/46111) [user1 at server0001 ~]# grep -e AnonHugePages /proc/*/smaps | awk '{ if($2>4) print $0} ' | awk -F "/" '{print $0; system("ps -fp " $3)} ' /proc/1330/smaps:AnonHugePages: 2048 kB UID PID PPID C STIME TTY TIME CMD root 1330 1310 23 Dec19 ? 16:38:37 java -Xmx31000m -Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC /proc/1330/smaps:AnonHugePages: 116736 kB UID PID PPID C STIME TTY TIME CMD root 1330 1310 23 Dec19 ? 16:38:37 java -Xmx31000m -Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC /proc/1330/smaps:AnonHugePages: 43008 kB UID PID PPID C STIME TTY TIME CMD root 1330 1310 23 Dec19 ? 16:38:37 java -Xmx31000m -Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC /proc/1330/smaps:AnonHugePages: 2048 kB UID PID PPID C STIME TTY TIME CMD root 1330 1310 23 Dec19 ? 16:38:37 java -Xmx31000m -Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC /proc/1346/smaps:AnonHugePages: 12288 kB Thanks, Ashwin Jayaprakash. On Mon, Dec 22, 2014 at 12:45 PM, Gustav ?kesson wrote: > Hi, > > Indeed thank you for reporting back on this. TIL about THP, so to say... > > Best Regards, > Gustav?kesson > Den 22 dec 2014 18:52 skrev "charlie hunt" : > > Thanks for reporting back, and great to hear disabling THP has solved your >> multi-second minor GCs issue. :-) >> >> charlie >> >> On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash < >> ashwin.jayaprakash at gmail.com> wrote: >> >> All, I'm happy to report that disabling THP made a big difference. We do >> not see multi-second minor GCs in our cluster anymore. >> >> Thanks for your help. >> >> On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt >> wrote: >> >>> Disabling transparent huge pages should help those GC pauses where you >>> are seeing high sys time reported, which should also shorten their pause >>> times. >>> >>> Thanks for also sharing your observation of khugepaged/pages_collapsed. >>> >>> charlie >>> >>> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash < >>> ashwin.jayaprakash at gmail.com> wrote: >>> >>> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the >>> "never" and thought it was already done. In fact "cat >>> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed >>> 904 and after an hour now, it shows 6845 on one of our machines. >>> >>> *@Jon* I dug through some of our ElasticSearch/application logs again >>> and tried to correlate them with the GC logs. The collection time does seem >>> to match the GC log's "real" time. However the collected sizes don't seem >>> to match, which is what threw me off. >>> >>> *Item 1:* >>> >>> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew >>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>> - age 1: 31568024 bytes, 31568024 total >>> - age 2: 1188576 bytes, 32756600 total >>> - age 3: 1830920 bytes, 34587520 total >>> - age 4: 282536 bytes, 34870056 total >>> - age 5: 316640 bytes, 35186696 total >>> - age 6: 249856 bytes, 35436552 total >>> : 931773K->49827K(996800K), 1.3622770 secs] >>> 22132844K->21256042K(31633280K)After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 1206815932 >>> Max Chunk Size: 1206815932 >>> Number of Blocks: 1 >>> Av. Block Size: 1206815932 >>> Tree Height: 1 >>> After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 6189459 >>> Max Chunk Size: 6188544 >>> Number of Blocks: 3 >>> Av. Block Size: 2063153 >>> Tree Height: 2 >>> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs] >>> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which >>> application threads were stopped: 1.3638970 seconds >>> >>> >>> [2014-12-18T21:34:57,203Z] [WARN ] >>> [elasticsearch[server00001][scheduler][T#1]] >>> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration >>> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory >>> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] >>> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] >>> [20.2gb]->[20.2gb]/[29.2gb]} >>> >>> >>> *Item 2:* >>> >>> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew >>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>> - age 1: 32445776 bytes, 32445776 total >>> - age 3: 6068000 bytes, 38513776 total >>> - age 4: 1031528 bytes, 39545304 total >>> - age 5: 255896 bytes, 39801200 total >>> : 939702K->53536K(996800K), 2.9352940 secs] >>> 21501296K->20625925K(31633280K)After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 1287922158 >>> Max Chunk Size: 1287922158 >>> Number of Blocks: 1 >>> Av. Block Size: 1287922158 >>> Tree Height: 1 >>> After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 6205476 >>> Max Chunk Size: 6204928 >>> Number of Blocks: 2 >>> Av. Block Size: 3102738 >>> Tree Height: 2 >>> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs] >>> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which >>> application threads were stopped: 2.9367640 seconds >>> >>> >>> [2014-12-18T20:53:37,950Z] [WARN ] >>> [elasticsearch[server00001][scheduler][T#1]] >>> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration >>> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory >>> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] >>> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] >>> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]} >>> >>> >>> *Item 3:* >>> >>> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: -966776244 >>> Max Chunk Size: -966776244 >>> Number of Blocks: 1 >>> Av. Block Size: -966776244 >>> Tree Height: 1 >>> Before GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 530 >>> Max Chunk Size: 268 >>> Number of Blocks: 2 >>> Av. Block Size: 265 >>> Tree Height: 2 >>> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew >>> Desired survivor size 56688640 bytes, new threshold 1 (max 6) >>> - age 1: 113315624 bytes, 113315624 total >>> : 996800K->110720K(996800K), 7.3511710 secs] >>> 5609422K->5065102K(31633280K)After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: -1009955715 >>> Max Chunk Size: -1009955715 >>> Number of Blocks: 1 >>> Av. Block Size: -1009955715 >>> Tree Height: 1 >>> After GC: >>> Statistics for BinaryTreeDictionary: >>> ------------------------------------ >>> Total Free Space: 530 >>> Max Chunk Size: 268 >>> Number of Blocks: 2 >>> Av. Block Size: 265 >>> Tree Height: 2 >>> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs] >>> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which >>> application threads were stopped: 7.3525250 seconds >>> >>> >>> [2014-12-17T14:42:17,944Z] [WARN ] >>> [elasticsearch[prdaes04data03][scheduler][T#1]] >>> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration >>> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory >>> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] >>> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] >>> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]} >>> >>> >>> >>> >>> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt >>> wrote: >>>> >>>> The output: >>>> >>>> *cat /sys/kernel/mm/transparent_hugepage/enabled* >>>> [always] madvise never >>>> >>>> >>>> Tells me that transparent huge pages are enabled ?always?. >>>> >>>> I think I would change this to ?never?, even though sysctl -a may be >>>> reporting no huge pages are currently in use. The system may trying to >>>> coalesce pages occasionally in attempt to make huge pages available, even >>>> though you are not currently using any. >>>> >>>> charlie >>>> >>>> >>>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash < >>>> ashwin.jayaprakash at gmail.com> wrote: >>>> >>>> *@Jon*, thanks for clearing that up. Yes, that was my source of >>>> confusion. I was misinterpreting the user time with the real time. >>>> >>>> *Jon's reply from an offline conversation:* >>>> >>>>> Are these the 7 second collections you refer to in the paragraph above? >>>>> If yes, the "user" time is the sum of the time spent by multiple GC >>>>> threads. >>>>> The real time is the GC pause time that your application experiences. >>>>> In the above case the GC pauses are .65s, 1.10s and .67s. >>>>> >>>> >>>> Something that added to my confusion was the tools we are using >>>> in-house. In addition to the GC logs we have 1 tool that uses the >>>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to >>>> match the values I see in the GC logs ( >>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 >>>> ). >>>> >>>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's >>>> LastGCInfo ( >>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 >>>> and >>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 >>>> ). >>>> >>>> Do these methods expose the total time spent by all the parallel GC >>>> threads for the ParNew pool or the "real" time? They do not seem to match >>>> the GC log times. >>>> >>>> *@Gustav* We do not have any swapping on the machines. It could be the >>>> disk IO experienced by the GC log writer itself, as you've suspected. The >>>> machine has 128G of RAM >>>> >>>> *"top" sample from a similar machine:* >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1 >>>> 2408:05 java >>>> >>>> >>>> *"free -g":* >>>> total used free shared buffers >>>> cached >>>> Mem: 120 119 0 0 0 >>>> 95 >>>> -/+ buffers/cache: 23 96 >>>> Swap: 0 0 0 >>>> >>>> *@Charlie* Hugepages has already been disabled >>>> >>>> *sudo sysctl -a | grep hugepage* >>>> vm.nr_hugepages = 0 >>>> vm.nr_hugepages_mempolicy = 0 >>>> vm.hugepages_treat_as_movable = 0 >>>> vm.nr_overcommit_hugepages = 0 >>>> >>>> *cat /sys/kernel/mm/transparent_hugepage/enabled* >>>> [always] madvise never >>>> >>>> >>>> Thanks all! >>>> >>>> >>>> >>>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu < >>>> jon.masamitsu at oracle.com> wrote: >>>>> >>>>> Ashwin, >>>>> >>>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote: >>>>> >>>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap >>>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). >>>>> >>>>> While our old gen seems to be very stable with about 40% usage and no >>>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few >>>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds >>>>> and is causing some of our requests to time out. The eden space keeps >>>>> filling up and then cleared every 30-60 seconds. There is definitely work >>>>> being done by our JVM in terms of caching/buffering objects for a few >>>>> seconds, writing to disk and then clearing the objects (typical >>>>> Lucene/ElasticSearch indexing and querying workload) >>>>> >>>>> >>>>> From you recent mail >>>>> >>>>> Times: user=7.89 sys=0.55, real=0.65 secs] >>>>> Times: user=7.71 sys=4.59, real=1.10 secs] >>>>> Times: user=7.46 sys=0.32, real=0.67 secs] >>>>> >>>>> Are these the 7 second collections you refer to in the paragraph above? >>>>> If yes, the "user" time is the sum of the time spent by multiple GC >>>>> threads. >>>>> The real time is the GC pause time that your application experiences. >>>>> In the above case the GC pauses are .65s, 1.10s and .67s. >>>>> >>>>> Comment below regarding "eden space keeps filling up". >>>>> >>>>> >>>>> These long pauses are not good for our server throughput and I was >>>>> doing some reading. I got some conflicting reports on how Cassandra is >>>>> configured compared to Hadoop. There are also references >>>>> >>>>> to this old ParNew+CMS bug >>>>> which I thought >>>>> would've been addressed in the JRE version we are using. Cassandra >>>>> recommends >>>>> a larger >>>>> NewSize with just 1 for max tenuring, whereas Hadoop recommends >>>>> a small NewSize. >>>>> >>>>> Since most of our allocations seem to be quite short lived, is there >>>>> a way to avoid these "long" young gen pauses? >>>>> >>>>> Thanks in advance. Here are some details. >>>>> >>>>> * Heap settings:* >>>>> java -Xmx31000m -Xms31000m >>>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m >>>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly >>>>> -XX:CMSInitiatingOccupancyFraction=70 >>>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure >>>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution >>>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps >>>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC >>>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails >>>>> >>>>> >>>>> * Last few lines of "kill -3 pid" output:* >>>>> Heap >>>>> par new generation total 996800K, used 865818K [0x00007fa18e800000, >>>>> 0x00007fa1d2190000, 0x00007fa1d2190000) >>>>> eden space 886080K, 94% used [0x00007fa18e800000, >>>>> 0x00007fa1c1a659e0, 0x00007fa1c4950000) >>>>> from space 110720K, 25% used [0x00007fa1cb570000, >>>>> 0x00007fa1cd091078, 0x00007fa1d2190000) >>>>> to space 110720K, 0% used [0x00007fa1c4950000, >>>>> 0x00007fa1c4950000, 0x00007fa1cb570000) >>>>> concurrent mark-sweep generation total 30636480K, used 12036523K >>>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000) >>>>> concurrent-mark-sweep perm gen total 128856K, used 77779K >>>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000) >>>>> >>>>> *Sample gc log:* >>>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew >>>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6) >>>>> - age 1: 2956312 bytes, 2956312 total >>>>> - age 2: 591800 bytes, 3548112 total >>>>> - age 3: 66216 bytes, 3614328 total >>>>> - age 4: 270752 bytes, 3885080 total >>>>> - age 5: 615472 bytes, 4500552 total >>>>> - age 6: 358440 bytes, 4858992 total >>>>> : 900635K->8173K(996800K), 0.0317340 secs] >>>>> 1352217K->463460K(31633280K)After GC: >>>>> >>>>> >>>>> In this GC eden is at 900635k before the GC and is a 8173k after. >>>>> That GC fills up is >>>>> the expected behavior. Is that what you were asking about above? If >>>>> not can you >>>>> send me an example of the "fills up" behavior. >>>>> >>>>> Jon >>>>> >>>>> Statistics for BinaryTreeDictionary: >>>>> ------------------------------------ >>>>> Total Free Space: -433641480 >>>>> Max Chunk Size: -433641480 >>>>> Number of Blocks: 1 >>>>> Av. Block Size: -433641480 >>>>> Tree Height: 1 >>>>> After GC: >>>>> Statistics for BinaryTreeDictionary: >>>>> ------------------------------------ >>>>> Total Free Space: 1227 >>>>> Max Chunk Size: 631 >>>>> Number of Blocks: 3 >>>>> Av. Block Size: 409 >>>>> Tree Height: 3 >>>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] >>>>> >>>>> >>>>> Ashwin Jayaprakash. >>>>> >>>>> >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>>> >>>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>>> >>>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From java at elyograg.org Tue Dec 23 17:46:27 2014 From: java at elyograg.org (Shawn Heisey) Date: Tue, 23 Dec 2014 10:46:27 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <1419170477.6868.1.camel@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> Message-ID: <5499AA73.9090003@elyograg.org> On 12/21/2014 7:01 AM, Thomas Schatzl wrote: > > Add > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX: > +PrintAdaptiveSizePolicy I have GC logging options in a separate environment variable. GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC" Here's the new G1 options list based on your feedback: JVM_OPTS=" \ -XX:+UseG1GC \ -XX:NewRatio=3 \ -XX:+ParallelRefProcEnabled -XX:maxGCPauseMillis=200 -XX:+UseLargePages \ -XX:+AggressiveOpts \ " Thanks, Shawn From thomas.schatzl at oracle.com Tue Dec 23 17:55:47 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 23 Dec 2014 18:55:47 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <5499AA73.9090003@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> Message-ID: <1419357347.3128.1.camel@oracle.com> Hi, On Tue, 2014-12-23 at 10:46 -0700, Shawn Heisey wrote: > On 12/21/2014 7:01 AM, Thomas Schatzl wrote: > > > > Add > > > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX: > > +PrintAdaptiveSizePolicy > > I have GC logging options in a separate environment variable. > > GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC" ParallelRefProcEnabled is missing. use GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest" The last two are additional verboseness options. > > Here's the new G1 options list based on your feedback: > > JVM_OPTS=" \ > -XX:+UseG1GC \ > -XX:NewRatio=3 \ Remove NewRatio. This will severely limit adaptiveness. > -XX:+ParallelRefProcEnabled > -XX:maxGCPauseMillis=200 Use "MaxGCPauseMillis" with capital M. > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " I.e. JVM_OPTS=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts Actually it might be as good to simply use: JVM_OPTS=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled" Because 200 is the default value for MaxGCPauseMillis, and the others either are not used anyway (no large pages in your system) or won't have any noticeable impact (AggressiveOpts has last been tuned to current systems ages ago; the only useful part of that is "-server" to enable the server compiler, but on 64 bit VMs the server compiler is default too). Always good to start from a clean slate. Depending on the results from the log we can improve the settings. Thanks, Thomas From java at elyograg.org Tue Dec 23 21:04:57 2014 From: java at elyograg.org (Shawn Heisey) Date: Tue, 23 Dec 2014 14:04:57 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <1419357347.3128.1.camel@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> Message-ID: <5499D8F9.9020607@elyograg.org> On 12/23/2014 10:55 AM, Thomas Schatzl wrote: > Remove NewRatio. This will severely limit adaptiveness. > >> -XX:+ParallelRefProcEnabled >> -XX:maxGCPauseMillis=200 > > Use "MaxGCPauseMillis" with capital M. > >> -XX:+UseLargePages \ >> -XX:+AggressiveOpts \ >> " > > I.e. > > JVM_OPTS=" \ > -XX:+UseG1GC \ > -XX:+ParallelRefProcEnabled \ > -XX:MaxGCPauseMillis=200 \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts That's what I ultimately ended up with. I don't have a lot of runtime yet, but this is looking REALLY good. It looks like parallel reference processing and waiting for a later Java 7 release were the secrets to using G1 effectively. https://www.dropbox.com/s/bhq97ishhb8d94w/g1gc-with-parallel-ref.png?dl=0 Here's the GC log for that graph: https://www.dropbox.com/s/9g687luo60bd4r0/g1gc-with-parallel-ref.log?dl=0 I got a little runtime in before removing NewRatio. It's not quite as good as the graph/log above, so removing it was a good thing: https://www.dropbox.com/s/ups6r2hohrnfcud/g1gc-with-parallel-ref-and-newratio-3.png?dl=0 https://www.dropbox.com/s/ccwu7axgdebywjt/g1gc-with-parallel-ref-and-newratio-3.log?dl=0 After I've got a few hours (and ultimately a few days) of runtime on the new settings, I will grab the log and graph it again. Many thanks for all your help! Shawn From thomas.schatzl at oracle.com Tue Dec 30 10:12:19 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 30 Dec 2014 11:12:19 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <5499D8F9.9020607@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org> Message-ID: <1419934339.3250.1.camel@oracle.com> Hi Shawn, On Tue, 2014-12-23 at 14:04 -0700, Shawn Heisey wrote: > On 12/23/2014 10:55 AM, Thomas Schatzl wrote: > > Remove NewRatio. This will severely limit adaptiveness. > > > >> -XX:+ParallelRefProcEnabled > >> -XX:maxGCPauseMillis=200 > > > > Use "MaxGCPauseMillis" with capital M. > > > >> -XX:+UseLargePages \ > >> -XX:+AggressiveOpts \ > >> " > > > > I.e. > > > > JVM_OPTS=" \ > > -XX:+UseG1GC \ > > -XX:+ParallelRefProcEnabled \ > > -XX:MaxGCPauseMillis=200 \ > > -XX:+UseLargePages \ > > -XX:+AggressiveOpts > > That's what I ultimately ended up with. I don't have a lot of runtime > yet, but this is looking REALLY good. Great to hear about field experience with G1 - a late Christmas present for us particularly because they are good. > It looks like parallel reference > processing and waiting for a later Java 7 release were the secrets to > using G1 effectively. You really might want to try 8u40: the small logs you provided indicate that there is at least some amount of large object allocation going on ("occupancy higher than threshold [...] cause: G1 Humongous Allocation"). 8u40 added some special handling for those which allows fast and cheap reclamation of these in some cases, which improves throughput by decreasing GC frequency. Nothing worrying I think given these logs, but these allocations/messages seem frequent enough that it could be considered useful. > After I've got a few hours (and ultimately a few days) of runtime on the > new settings, I will grab the log and graph it again. Would be really nice to have them for analysis. Maybe they could be used to tweak the threshold that starts concurrent cycles to reduce the number of GCs. > > Many thanks for all your help! Did not do anything yet other than removing all CMS flags :) Thanks, Thomas From java at elyograg.org Tue Dec 30 18:20:44 2014 From: java at elyograg.org (Shawn Heisey) Date: Tue, 30 Dec 2014 11:20:44 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <1419934339.3250.1.camel@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org> <1419934339.3250.1.camel@oracle.com> Message-ID: <54A2ECFC.1010401@elyograg.org> On 12/30/2014 3:12 AM, Thomas Schatzl wrote: > Great to hear about field experience with G1 - a late Christmas present > for us particularly because they are good. > >> It looks like parallel reference >> processing and waiting for a later Java 7 release were the secrets to >> using G1 effectively. > > You really might want to try 8u40: the small logs you provided indicate > that there is at least some amount of large object allocation going on > ("occupancy higher than threshold [...] cause: G1 Humongous > Allocation"). 8u40 added some special handling for those which allows > fast and cheap reclamation of these in some cases, which improves > throughput by decreasing GC frequency. I do have plans to roll out some Java 8 servers for a new project, and ultimately I expect we will upgrade to Java 8 for the existing servers, but it won't be an immediate thing. > Nothing worrying I think given these logs, but these > allocations/messages seem frequent enough that it could be considered > useful. > >> After I've got a few hours (and ultimately a few days) of runtime on the >> new settings, I will grab the log and graph it again. > > Would be really nice to have them for analysis. > > Maybe they could be used to tweak the threshold that starts concurrent > cycles to reduce the number of GCs. I've now got almost a full week of runtime on these new settings. Here's the log: https://www.dropbox.com/s/yla4le5l5jrhiir/gc-idxa1-g1-7u72-with-refproc-one-week.zip?dl=0 Trying to graph this log with gcviewer-1.34.jar, it found five entries in the log that it didn't know how to deal with. The times on these lines do look fairly significant, and I assume that they are probably not in the resulting graph. INFO [DataReaderFacade]: GCViewer version 1.34 (2014-11-30T14:40:14+0100) INFO [DataReaderFactory]: File format: Sun 1.6.x G1 collector INFO [DataReaderSun1_6_0G1]: Reading Sun 1.6.x / 1.7.x G1 format... WARNING [DataReaderSun1_6_0G1]: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: 'M->' Line 47280: 5987M->1658M(6144M), 2.3640370 secs] WARNING [DataReaderSun1_6_0G1]: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: 'M->' Line 155388: 2928M->1721M(6104M), 2.7344030 secs] WARNING [DataReaderSun1_6_0G1]: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: 'M->' Line 190388: 2615M->1550M(6018M), 2.3079810 secs] WARNING [DataReaderSun1_6_0G1]: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: 'M->' Line 392918: 6003M->1602M(6138M), 2.1626330 secs] WARNING [DataReaderSun1_6_0G1]: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: 'M->' Line 397195: 6031M->1707M(6138M), 3.0746840 secs] INFO [DataReaderSun1_6_0G1]: Done reading. What it DID graph looks fairly good, but there are a handful of long collections in there. Only one of those longer collections looked like a long enough pause that it would trigger a failed load balancer health check (every five seconds, with a 4990 millisecond timeout), but most of them are long enough that a user would definitely notice the delay on a single search. That probably wouldn't be enough of a problem for them to lodge a complaint or decide that the site sucks, because the search would be fast on the next query. The overall graph shows that a typical collection happens *very* quickly, so perhaps those few outliers are not enough of a problem to cause me much concern. If Java 8 can smooth down those rough edges, I think we have a clear winner. Even with Java 7, I am very excited about these results. Are there any alternate tools for producing nice graphs from GC logs, tools that can understand everything in a log from a modern JVM? Thanks, Shawn From yu.zhang at oracle.com Tue Dec 30 22:06:45 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Tue, 30 Dec 2014 14:06:45 -0800 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <54A2ECFC.1010401@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org> <1419934339.3250.1.camel@oracle.com> <54A2ECFC.1010401@elyograg.org> Message-ID: <54A321F5.7060209@oracle.com> Shawn, There are 10 Full gcs, each takes about 2-5 seconds. The live data set after full gc is ~2g. The heap size expanded from 4g to 6g around 45,650 sec. As Thomas noticed, there are a lot of humongous objects (each of about 2m size). some of them can be cleaned after marking. If you can not move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid of the humongous objects. Thanks, Jenny On 12/30/2014 10:20 AM, Shawn Heisey wrote: > On 12/30/2014 3:12 AM, Thomas Schatzl wrote: >> Great to hear about field experience with G1 - a late Christmas present >> for us particularly because they are good. >> >>> It looks like parallel reference >>> processing and waiting for a later Java 7 release were the secrets to >>> using G1 effectively. >> You really might want to try 8u40: the small logs you provided indicate >> that there is at least some amount of large object allocation going on >> ("occupancy higher than threshold [...] cause: G1 Humongous >> Allocation"). 8u40 added some special handling for those which allows >> fast and cheap reclamation of these in some cases, which improves >> throughput by decreasing GC frequency. > I do have plans to roll out some Java 8 servers for a new project, and > ultimately I expect we will upgrade to Java 8 for the existing servers, > but it won't be an immediate thing. > >> Nothing worrying I think given these logs, but these >> allocations/messages seem frequent enough that it could be considered >> useful. >> >>> After I've got a few hours (and ultimately a few days) of runtime on the >>> new settings, I will grab the log and graph it again. >> Would be really nice to have them for analysis. >> >> Maybe they could be used to tweak the threshold that starts concurrent >> cycles to reduce the number of GCs. > I've now got almost a full week of runtime on these new settings. > Here's the log: > > https://www.dropbox.com/s/yla4le5l5jrhiir/gc-idxa1-g1-7u72-with-refproc-one-week.zip?dl=0 > > Trying to graph this log with gcviewer-1.34.jar, it found five entries > in the log that it didn't know how to deal with. The times on these > lines do look fairly significant, and I assume that they are probably > not in the resulting graph. > > INFO [DataReaderFacade]: GCViewer version 1.34 (2014-11-30T14:40:14+0100) > INFO [DataReaderFactory]: File format: Sun 1.6.x G1 collector > INFO [DataReaderSun1_6_0G1]: Reading Sun 1.6.x / 1.7.x G1 format... > WARNING [DataReaderSun1_6_0G1]: > com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: > 'M->' Line 47280: 5987M->1658M(6144M), 2.3640370 secs] > WARNING [DataReaderSun1_6_0G1]: > com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: > 'M->' Line 155388: 2928M->1721M(6104M), 2.7344030 secs] > WARNING [DataReaderSun1_6_0G1]: > com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: > 'M->' Line 190388: 2615M->1550M(6018M), 2.3079810 secs] > WARNING [DataReaderSun1_6_0G1]: > com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: > 'M->' Line 392918: 6003M->1602M(6138M), 2.1626330 secs] > WARNING [DataReaderSun1_6_0G1]: > com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: > 'M->' Line 397195: 6031M->1707M(6138M), 3.0746840 secs] > INFO [DataReaderSun1_6_0G1]: Done reading. > > What it DID graph looks fairly good, but there are a handful of long > collections in there. Only one of those longer collections looked like > a long enough pause that it would trigger a failed load balancer health > check (every five seconds, with a 4990 millisecond timeout), but most of > them are long enough that a user would definitely notice the delay on a > single search. That probably wouldn't be enough of a problem for them > to lodge a complaint or decide that the site sucks, because the search > would be fast on the next query. The overall graph shows that a typical > collection happens *very* quickly, so perhaps those few outliers are not > enough of a problem to cause me much concern. If Java 8 can smooth down > those rough edges, I think we have a clear winner. Even with Java 7, I > am very excited about these results. > > Are there any alternate tools for producing nice graphs from GC logs, > tools that can understand everything in a log from a modern JVM? > > Thanks, > Shawn > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From java at elyograg.org Wed Dec 31 00:29:39 2014 From: java at elyograg.org (Shawn Heisey) Date: Tue, 30 Dec 2014 17:29:39 -0700 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <54A321F5.7060209@oracle.com> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org> <1419934339.3250.1.camel@oracle.com> <54A2ECFC.1010401@elyograg.org> <54A321F5.7060209@oracle.com> Message-ID: <54A34373.9000509@elyograg.org> On 12/30/2014 3:06 PM, Yu Zhang wrote: > There are 10 Full gcs, each takes about 2-5 seconds. The live data set > after full gc is ~2g. The heap size expanded from 4g to 6g around > 45,650 sec. > > As Thomas noticed, there are a lot of humongous objects (each of about > 2m size). some of them can be cleaned after marking. If you can not > move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid > of the humongous objects. Those huge objects may be Solr filterCache entries. Each of my large Solr indexes is over 16 million documents. Because a filterCache entry is a bitset representing those documents, it would be about 16.3 million bits in length, or approximately 2 MB. It could be other things -- Lucene handles a bunch of other things in large byte arrays, though I'm not very familiar with those internals. I will try the option you have indicated. My index updating software does indexing once a minute. Once an hour, larger processes are done, and once a day, one of the large indexes is optimized, which likely generates a lot of garbage in a very short time. Thanks, Shawn From thomas.schatzl at oracle.com Wed Dec 31 14:19:05 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 31 Dec 2014 15:19:05 +0100 Subject: G1 with Solr - thread from dev@lucene.apache.org In-Reply-To: <54A34373.9000509@elyograg.org> References: <54906C34.8080408@elyograg.org> <1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org> <1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org> <1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org> <1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org> <1419934339.3250.1.camel@oracle.com> <54A2ECFC.1010401@elyograg.org> <54A321F5.7060209@oracle.com> <54A34373.9000509@elyograg.org> Message-ID: <1420035545.3277.2.camel@oracle.com> Hi Shawn, On Tue, 2014-12-30 at 17:29 -0700, Shawn Heisey wrote: > On 12/30/2014 3:06 PM, Yu Zhang wrote: > > There are 10 Full gcs, each takes about 2-5 seconds. The live data set > > after full gc is ~2g. The heap size expanded from 4g to 6g around > > 45,650 sec. > > > > As Thomas noticed, there are a lot of humongous objects (each of about > > 2m size). some of them can be cleaned after marking. If you can not > > move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid > > of the humongous objects. -XX:G1HeapRegionSize=4M should be sufficient: all the objects I have seen are slightly smaller than 2M, which corresponds to Shawn's statement about having around 16.3M bits in length. With -Xms4G -Xmx6G the default region size is 2M, not 4M. Using -XX:G1HeapRegionSize=8M seems overkill. > Those huge objects may be Solr filterCache entries. Each of my large > Solr indexes is over 16 million documents. Because a filterCache entry > is a bitset representing those documents, it would be about 16.3 million > bits in length, or approximately 2 MB. It could be other things -- > Lucene handles a bunch of other things in large byte arrays, though I'm > not very familiar with those internals. > > I will try the option you have indicated. I agree with Jenny that we should try increasing heap region size slightly first. > My index updating software does indexing once a minute. Once an hour, > larger processes are done, and once a day, one of the large indexes is > optimized, which likely generates a lot of garbage in a very short time. Just fyi, the problem with these large byte arrays is that with 7uX, G1 cannot reclaim them during young GC but needs to wait for a complete marking cycle. If that takes too long (longer than the next young GC occurs), the next young GC may not have enough space to complete the GC, potentially falling back to the mentioned full gcs. That seems to happen a few times. There are two other options that could be tried to improve the situation (although I think increasing the heap region size should be sufficient), that is -XX:-ResizePLAB which decreases the amount of space G1 will waste during GC (it does so for performance reasons, but the logic is somewhat flawed - I am currently working on that). The other is to cap the young gen size so that the amount of potential survivors is smaller in the first place, e.g. -XX:G1MaxNewSize=1536M // 1.5G seems reasonable without decreasing throughput too much; a lot of these full gcs seem to appear after G1 using extremely large eden sizes. This is most likely due to the spiky allocation behavior of the application: i.e. long stretches of almost every object dying, and then short bursts. Since G1 tunes itself to the former, it will simply try to use too much eden size for these spikes. But I recommend first seeing the impact of the increase in region size. Thanks, Thomas