From tony.printezis at oracle.com Thu May 6 12:32:30 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Thu, 06 May 2010 15:32:30 -0400 Subject: Feedback requested: HotSpot GC logging improvements Message-ID: <4BE3194E.902@oracle.com> Hi all, We would like your input on some changes to HotSpot's GC logging that we have been discussing. We have been wanting to improve our GC logging for some time. However we haven't had the resources to spend on it. We don't know when we'll get to it, but we'd still like to get some feedback on our plans. The changes fall into two categories. A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. I strongly believe that maintaining two GC log formats is counter-productive, especially given that the current -verbosegc format is unhelpful in many ways (i.e., lacks a lot of helpful information). So, we would like to unify the two into one, with maybe -XX:+PrintGCDetails generating a superset of what -verbosegc would generate (so that a parser for the -XX:+PrintGCDetails output will also be able to parse the -verbosegc output). The new output will not be what -XX:+PrintGCDetails generates today but something that can be reliably parsed and it is also reasonably human-readable (so, no xml and no space/tab-separated formats). Additionally, we're proposing to enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and ignore that option, I can't believe that users will really not want a time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be optional though. Specific questions: - Is anyone really attached to the old -verbosegc output? - Would anyone really hate having time stamps by default? - I know that a lot of folks have their own parsers for our current GC log formats. Would you be happy if we provided you with a (reliable!) parser for the new format in Java that you can easily adapt? B. Introducing "cyclic" GC logs. This is something that a lot of folks have asked for given that they were concerned with the GC logs getting very large (a 1TB disk is $85 these days, but anyway...). Given that each GC log record is of variable size, we cannot easily cycle through the log using the same file (I'd rather not have to overwrite existing records). Our current proposal is for the user to specify a file number N and a size target S for each file. For a given GC log -Xloggc:foo, HotSpot will generate foo.00000001 foo.00000002 foo.00000003 etc. (we'll create a new file as soon as the size of the one we are writing to exceeds S, so each file will be slightly larger than S but it will be helpful not to split individual log records between two files) When we create a new file, if we have more than N files we'll delete the oldest. So, in the above example, if N == 3, when we create foo.00000004 we'll delete foo.00000001. Note that in the above scheme, the logs are not really "cyclic" but, instead, we're pruning the oldest records every now and then, which has the same effect. Another (related) request has been to maybe append the GC log file name with the pid of the JVM that's generating it. Maybe we don't want to do this by default. But, would people find it helpful if we provide a new cmd line parameter to do that? So, for the above example and assuming that the JVM's pid is 1234, the GC log file(s) will be either: foo.1234 or foo.1234.00000001 foo.1234.00000002 foo.1234.00000003 etc. Specific questions: - Would people really hate it if HotSpot starts appending the GC log file name with a (zero-padded) sequence number? Maybe if N == 1 (the default), HotSpot will skip the sequence number and ignore S, i.e., behave as it does today. - To the people who have been asking for cyclic GC logs: is the sequence number scheme above good enough? Thanks in advance for your feedback, Tony, HotSpot GC Group From ryanobjc at gmail.com Thu May 6 12:51:48 2010 From: ryanobjc at gmail.com (Ryan Rawson) Date: Thu, 6 May 2010 12:51:48 -0700 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: Hey, I would say that PrintGCDateStamps should be the default - or at least promoted heavily as the option to use. Since most other logs (eg: log4j) are in "normal time" correlating GC logs and server logs require determining when the VM started, then doing some on the fly math for every log line you are interested in. Sometimes determining VM start time is impossible because the first log line is offset from the VM start time by X milliseconds, and in a tight debugging situation this could be all the difference in the world. On the logfile front, there are at least 2 problems: - successive runs overwrite the previous log file. I ran into this problem and lost the ability to debug a problem. - a logfile will grow without bound, although in my experience I have not had space problems with this. While you are correct that _the cheapest disks_ you can buy run $85/TB, this is not the kinds of disks many people are installing into server-type systems. A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more expensive than $85/TB. -ryan On Thu, May 6, 2010 at 12:32 PM, Tony Printezis wrote: > Hi all, > > We would like your input on some changes to HotSpot's GC logging that we > have been discussing. We have been wanting to improve our GC logging for > some time. However we haven't had the resources to spend on it. We don't > know when we'll get to it, but we'd still like to get some feedback on > our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will also > be able to parse the -verbosegc output). The new output will not be what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete the > oldest. So, in the above example, if N == 3, when we create foo.00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which has > the same effect. > > Another (related) request has been to maybe append the GC log file name > with the pid of the JVM that's generating it. Maybe we don't want to do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From matt.khan at db.com Thu May 6 13:01:41 2010 From: matt.khan at db.com (Matt Khan) Date: Thu, 6 May 2010 21:01:41 +0100 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> Message-ID: Evening we currently manage the log overwriting issue by mv'ing the last gc.log to gc.log., if you're going to roll the logs then I would prefer a meaningful suffix rather than just a counter. I second the idea that datestamps should be the default. I think a unified, easily parseable but still readable output would be great though wouldn't you still need a verbose output that is specific to each collector in order to provide a "debug" level of detail? Cheers Matt Matt Khan -------------------------------------------------- GFFX Auto Trading Deutsche Bank, London Tony Printezis Sent by: hotspot-gc-use-bounces at openjdk.java.net 06/05/2010 20:32 To hotspot-gc-use at openjdk.java.net cc Subject Feedback requested: HotSpot GC logging improvements Hi all, We would like your input on some changes to HotSpot's GC logging that we have been discussing. We have been wanting to improve our GC logging for some time. However we haven't had the resources to spend on it. We don't know when we'll get to it, but we'd still like to get some feedback on our plans. The changes fall into two categories. A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. I strongly believe that maintaining two GC log formats is counter-productive, especially given that the current -verbosegc format is unhelpful in many ways (i.e., lacks a lot of helpful information). So, we would like to unify the two into one, with maybe -XX:+PrintGCDetails generating a superset of what -verbosegc would generate (so that a parser for the -XX:+PrintGCDetails output will also be able to parse the -verbosegc output). The new output will not be what -XX:+PrintGCDetails generates today but something that can be reliably parsed and it is also reasonably human-readable (so, no xml and no space/tab-separated formats). Additionally, we're proposing to enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and ignore that option, I can't believe that users will really not want a time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be optional though. Specific questions: - Is anyone really attached to the old -verbosegc output? - Would anyone really hate having time stamps by default? - I know that a lot of folks have their own parsers for our current GC log formats. Would you be happy if we provided you with a (reliable!) parser for the new format in Java that you can easily adapt? B. Introducing "cyclic" GC logs. This is something that a lot of folks have asked for given that they were concerned with the GC logs getting very large (a 1TB disk is $85 these days, but anyway...). Given that each GC log record is of variable size, we cannot easily cycle through the log using the same file (I'd rather not have to overwrite existing records). Our current proposal is for the user to specify a file number N and a size target S for each file. For a given GC log -Xloggc:foo, HotSpot will generate foo.00000001 foo.00000002 foo.00000003 etc. (we'll create a new file as soon as the size of the one we are writing to exceeds S, so each file will be slightly larger than S but it will be helpful not to split individual log records between two files) When we create a new file, if we have more than N files we'll delete the oldest. So, in the above example, if N == 3, when we create foo.00000004 we'll delete foo.00000001. Note that in the above scheme, the logs are not really "cyclic" but, instead, we're pruning the oldest records every now and then, which has the same effect. Another (related) request has been to maybe append the GC log file name with the pid of the JVM that's generating it. Maybe we don't want to do this by default. But, would people find it helpful if we provide a new cmd line parameter to do that? So, for the above example and assuming that the JVM's pid is 1234, the GC log file(s) will be either: foo.1234 or foo.1234.00000001 foo.1234.00000002 foo.1234.00000003 etc. Specific questions: - Would people really hate it if HotSpot starts appending the GC log file name with a (zero-padded) sequence number? Maybe if N == 1 (the default), HotSpot will skip the sequence number and ignore S, i.e., behave as it does today. - To the people who have been asking for cyclic GC logs: is the sequence number scheme above good enough? Thanks in advance for your feedback, Tony, HotSpot GC Group _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100506/c23a82d8/attachment.html From matt.fowles at gmail.com Thu May 6 13:05:46 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Thu, 6 May 2010 16:05:46 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: References: <4BE3194E.902@oracle.com> Message-ID: Tony~ Definitely date stamps as the default. The math to correlate things is annoying and error prone. Matt On Thu, May 6, 2010 at 4:01 PM, Matt Khan wrote: > > Evening > > we currently manage the log overwriting issue by mv'ing the last gc.log to > gc.log., if you're going to roll the > logs then I would prefer a meaningful suffix rather than just a counter. > > I second the idea that datestamps should be the default. > > I think a unified, easily parseable but still readable output would be > great though wouldn't you still need a verbose output that is specific to > each collector in order to provide a "debug" level of detail? > > Cheers > Matt > > Matt Khan > -------------------------------------------------- > GFFX Auto Trading > Deutsche Bank, London > > > > *Tony Printezis * > Sent by: hotspot-gc-use-bounces at openjdk.java.net > > 06/05/2010 20:32 > To > hotspot-gc-use at openjdk.java.net > cc > Subject > Feedback requested: HotSpot GC logging improvements > > > > > Hi all, > > We would like your input on some changes to HotSpot's GC logging that we > have been discussing. We have been wanting to improve our GC logging for > some time. However we haven't had the resources to spend on it. We don't > know when we'll get to it, but we'd still like to get some feedback on > our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will also > be able to parse the -verbosegc output). The new output will not be what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete the > oldest. So, in the above example, if N == 3, when we create foo.00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which has > the same effect. > > Another (related) request has been to maybe append the GC log file name > with the pid of the JVM that's generating it. Maybe we don't want to do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > --- > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient (or have received this e-mail in error) > please notify the sender immediately and delete this e-mail. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > > Please refer to http://www.db.com/en/content/eu_disclosures.htm for > additional EU corporate and regulatory disclosures. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100506/6a805325/attachment-0001.html From michael.finocchiaro at gmail.com Thu May 6 13:20:35 2010 From: michael.finocchiaro at gmail.com (Michael Finocchiaro) Date: Thu, 6 May 2010 22:20:35 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: References: Message-ID: Perhaps the logfile suffix could be parametrized (sequence, datestamp, etc)? GCTimeStamps efinitely should be definitely be the default because as previously pointed out, post-mortem without 'em is painful. For more debug detail per collector one could use PrintHeapAtGC or similar couldn't one? I always had a weakness for the simplicity of HP's format - one line, space separated with time, date, cause and sizes of each generation. Super easy to parse and even eye-scan. That being said, now that we have jVisualVM with snapshots and everything the GCDetails output is complete and practical as well. Cheers, Fini Sent from Fino's iPhone 3GS Michael Finocchiaro Mobile +6 85 46 07 62 http://mfinocchiaro.wordpress.com On 6 mai 2010, at 22:01, Matt Khan wrote: > > Evening > > we currently manage the log overwriting issue by mv'ing the last > gc.log to gc.log., if you're > going to roll the logs then I would prefer a meaningful suffix > rather than just a counter. > > I second the idea that datestamps should be the default. > > I think a unified, easily parseable but still readable output would > be great though wouldn't you still need a verbose output that is > specific to each collector in order to provide a "debug" level of > detail? > > Cheers > Matt > > Matt Khan > -------------------------------------------------- > GFFX Auto Trading > Deutsche Bank, London > > > > Tony Printezis > Sent by: hotspot-gc-use-bounces at openjdk.java.net > 06/05/2010 20:32 > > To > hotspot-gc-use at openjdk.java.net > cc > Subject > Feedback requested: HotSpot GC logging improvements > > > > > > Hi all, > > We would like your input on some changes to HotSpot's GC logging > that we > have been discussing. We have been wanting to improve our GC logging > for > some time. However we haven't had the resources to spend on it. We > don't > know when we'll get to it, but we'd still like to get some feedback on > our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails > output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc > format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will > also > be able to parse the -verbosegc output). The new output will not be > what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate > and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to > be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of > variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal > is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it > will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete > the > oldest. So, in the above example, if N == 3, when we create foo. > 00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which > has > the same effect. > > Another (related) request has been to maybe append the GC log file > name > with the pid of the JVM that's generating it. Maybe we don't want to > do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the > sequence > number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > --- > > This e-mail may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this e-mail > in error) please notify the sender immediately and delete this e- > mail. Any unauthorized copying, disclosure or distribution of the > material in this e-mail is strictly forbidden. > > Please refer to http://www.db.com/en/content/eu_disclosures.htm for > additional EU corporate and regulatory disclosures. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100506/7cd97d1c/attachment.html From adamh at basis.com Thu May 6 13:28:51 2010 From: adamh at basis.com (Adam Hawthorne) Date: Thu, 6 May 2010 16:28:51 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: Tony, On Thu, May 6, 2010 at 15:32, Tony Printezis wrote: > Hi all, > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. > [snip] > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > We aren't. - Would anyone really hate having time stamps by default? > I'm in agreement with other folks; timestamps are okay, but date stamps by default would be better. - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > +1. Or +10. > > B. Introducing "cyclic" GC logs. > Specific questions: [snip] > - Would people really hate it if HotSpot starts appending the GC log file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. How many digits in the sequence? Would that be configurable? Overall, having this is better than not having it. - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? > > Much better than nothing at all for disk-conscious customers. Thanks, Adam -- Adam Hawthorne Software Engineer BASIS International Ltd. www.basis.com +1.505.345.5232 Phone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100506/806fd8b4/attachment.html From jeff.lloyd at algorithmics.com Thu May 6 13:27:58 2010 From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com) Date: Thu, 6 May 2010 16:27:58 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0C6C5A2F@TORMAIL.algorithmics.com> Hi Tony, I'm in favour of item A. I'm not attached to the old format, and I'd like to see time stamps as the default. I also like item B. Can I make a suggestion for item B? Have you noticed everyone who posts their GC log file additionally has to include in their email the gc config options that they think they used for that particular run? In our application we prefix _every_ "cyclic" log file with the config options used to start the app. It makes reporting problems easier and we can see which options the client used by looking at the top of the log file instead of having to ask the client for what they thought they used. Jeff -----Original Message----- From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Tony Printezis Sent: Thursday, May 06, 2010 3:33 PM To: hotspot-gc-use at openjdk.java.net Subject: Feedback requested: HotSpot GC logging improvements Hi all, We would like your input on some changes to HotSpot's GC logging that we have been discussing. We have been wanting to improve our GC logging for some time. However we haven't had the resources to spend on it. We don't know when we'll get to it, but we'd still like to get some feedback on our plans. The changes fall into two categories. A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. I strongly believe that maintaining two GC log formats is counter-productive, especially given that the current -verbosegc format is unhelpful in many ways (i.e., lacks a lot of helpful information). So, we would like to unify the two into one, with maybe -XX:+PrintGCDetails generating a superset of what -verbosegc would generate (so that a parser for the -XX:+PrintGCDetails output will also be able to parse the -verbosegc output). The new output will not be what -XX:+PrintGCDetails generates today but something that can be reliably parsed and it is also reasonably human-readable (so, no xml and no space/tab-separated formats). Additionally, we're proposing to enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and ignore that option, I can't believe that users will really not want a time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be optional though. Specific questions: - Is anyone really attached to the old -verbosegc output? - Would anyone really hate having time stamps by default? - I know that a lot of folks have their own parsers for our current GC log formats. Would you be happy if we provided you with a (reliable!) parser for the new format in Java that you can easily adapt? B. Introducing "cyclic" GC logs. This is something that a lot of folks have asked for given that they were concerned with the GC logs getting very large (a 1TB disk is $85 these days, but anyway...). Given that each GC log record is of variable size, we cannot easily cycle through the log using the same file (I'd rather not have to overwrite existing records). Our current proposal is for the user to specify a file number N and a size target S for each file. For a given GC log -Xloggc:foo, HotSpot will generate foo.00000001 foo.00000002 foo.00000003 etc. (we'll create a new file as soon as the size of the one we are writing to exceeds S, so each file will be slightly larger than S but it will be helpful not to split individual log records between two files) When we create a new file, if we have more than N files we'll delete the oldest. So, in the above example, if N == 3, when we create foo.00000004 we'll delete foo.00000001. Note that in the above scheme, the logs are not really "cyclic" but, instead, we're pruning the oldest records every now and then, which has the same effect. Another (related) request has been to maybe append the GC log file name with the pid of the JVM that's generating it. Maybe we don't want to do this by default. But, would people find it helpful if we provide a new cmd line parameter to do that? So, for the above example and assuming that the JVM's pid is 1234, the GC log file(s) will be either: foo.1234 or foo.1234.00000001 foo.1234.00000002 foo.1234.00000003 etc. Specific questions: - Would people really hate it if HotSpot starts appending the GC log file name with a (zero-padded) sequence number? Maybe if N == 1 (the default), HotSpot will skip the sequence number and ignore S, i.e., behave as it does today. - To the people who have been asking for cyclic GC logs: is the sequence number scheme above good enough? Thanks in advance for your feedback, Tony, HotSpot GC Group _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------------------------------------------------------------------- This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory. -------------------------------------------------------------------------- From ycraig at cysystems.com Thu May 6 13:47:17 2010 From: ycraig at cysystems.com (craig yeldell) Date: Thu, 6 May 2010 16:47:17 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: A. - No I am not attached to the old -verbosegc output. - I would also think that having the time stamps should be default. - Having a reliable parser would definitely be a welcome addition. B. - We handle our own gc log rotation, but if you were to provide it I would prefer the zero padded sequence number. - Not one of those folks. Regards, Craig On May 6, 2010, at 3:32 PM, Tony Printezis wrote: > Hi all, > > We would like your input on some changes to HotSpot's GC logging > that we > have been discussing. We have been wanting to improve our GC logging > for > some time. However we haven't had the resources to spend on it. We > don't > know when we'll get to it, but we'd still like to get some feedback on > our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails > output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc > format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will > also > be able to parse the -verbosegc output). The new output will not be > what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate > and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to > be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of > variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal > is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it > will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete > the > oldest. So, in the above example, if N == 3, when we create foo. > 00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which > has > the same effect. > > Another (related) request has been to maybe append the GC log file > name > with the pid of the JVM that's generating it. Maybe we don't want to > do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the > sequence > number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Matthew.H.Miller at Sun.COM Thu May 6 13:46:20 2010 From: Matthew.H.Miller at Sun.COM (Matthew Miller) Date: Thu, 06 May 2010 16:46:20 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <0FCC438D62A5E643AA3F57D3417B220D0C6C5A2F@TORMAIL.algorithmics.com> References: <4BE3194E.902@oracle.com> <0FCC438D62A5E643AA3F57D3417B220D0C6C5A2F@TORMAIL.algorithmics.com> Message-ID: <4BE32A9C.8010705@sun.com> I'd like to say +1 to the idea below - if it were some how possible to (somewhat) easily tell all the GC options used from with in the GC Log itself, it would be very useful. (Especially to the support organization). -Matt On 5/6/2010 4:27 PM, jeff.lloyd at algorithmics.com wrote: [... Snip ...] > Can I make a suggestion for item B? Have you noticed everyone who posts > their GC log file additionally has to include in their email the gc > config options that they think they used for that particular run? In > our application we prefix _every_ "cyclic" log file with the config > options used to start the app. It makes reporting problems easier and > we can see which options the client used by looking at the top of the > log file instead of having to ask the client for what they thought they > used. > > Jeff > From rainer.jung at kippdata.de Thu May 6 13:04:31 2010 From: rainer.jung at kippdata.de (Rainer Jung) Date: Thu, 06 May 2010 22:04:31 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: <4BE320CF.402@kippdata.de> Wonderful! Comments inline. On 06.05.2010 21:32, Tony Printezis wrote: > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will also > be able to parse the -verbosegc output). The new output will not be what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? Not at all if we have a chance to get something better. > - Would anyone really hate having time stamps by default? We had to do a lot of quirks to simulate GCDateStamps for years until they finally made it into Java 6. Having timestamps by default is a must, absolute timestamps should be at least optional. Personally I find the absolute timestamps more important than the once relative from the JVM start, but that depends on what you are doing. Gathering statistical data the relative ones are better, since you can do computations more easily, tracking problems sometimes the absolute ones are easier, because you quickly want to know whether the log lines match the time of day when you observed the problem. > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? Of course. Although I can imagine folks qould want to get it in different implementation techniques. I guess you plan to provide a parser written in Java? Would be great, if it could be provided in a way making it easy fr people to customize, so possibly Open Source with a nice license like Apache Software License 2. > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete the > oldest. So, in the above example, if N == 3, when we create foo.00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which has > the same effect. There's a lot of options here. When you doing log rotation, people who want to archive the logs might have regular jobs (cron and friends) fetching the old closed files and transferring them to another system. In that case it would be nice if the apparatus would not get into conflict by both the internal rotation and the external script operating on the same files. f00.00000001 might have been detected as old and copied to the remote host and during the same time GC decides to now reuse it. Of course people can increase the cycle length and so on, but I always found it a bit problematic if a log rotation mechanism touches old files long after the rotation happened. That's why I personally find externally organized pruning better. Of course than it's not carefree out of the box. Another thing I often miss is the ability to combine size and time based rotation. I want to say: rotate whenever 10MB are full so that the chunks I need to handle do not get to big, but please also rotate at midnight, so that I know that I can grab the complete files of the day after midnight. So specifying a max size and a time pattern and the first criterium fulfilled already triggers rotation. > Another (related) request has been to maybe append the GC log file name > with the pid of the JVM that's generating it. Maybe we don't want to do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? Some time ago I asked whether it would be possible to get the %p substitution (replace it by the process id) that is already available for some files also for the GC log. I think it already exists in the JDK code either for the HeapDumpOnOutOfMemoryError or the hotspot error file. Forgot for which. The code is extremely simple. Would foo.%p.%8N be to complex? Great initiative! Will you start another discussion about the data contents of the file? It could be interesting when people describe what kind of information they extract out of the GC logs. Not everything is straightforward in the sense of it is based on individual lines. As an example I always calculate the total stopped time per minute (summing up) as a percentage of wallclock time. Regards, Rainer From rainer.jung at kippdata.de Thu May 6 13:18:09 2010 From: rainer.jung at kippdata.de (Rainer Jung) Date: Thu, 06 May 2010 22:18:09 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: <4BE32401.9070309@kippdata.de> Short addition to my previous post: On 06.05.2010 21:32, Tony Printezis wrote: > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? Another slight problem with the numbering scheme is that during archiving you'll overwrite old files. So your archive scripts need to intelligently rename the files. Not too easy. Maybe adding a couple of substitution characters (%p=pid, %N roll number, %Y, ... the usual strftime caharcters for timestamp formatting). Regards, Rainer From Peter.B.Kessler at Oracle.COM Thu May 6 15:00:43 2010 From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler) Date: Thu, 06 May 2010 15:00:43 -0700 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: References: Message-ID: <4BE33C0B.5060009@Oracle.COM> +1 on using date stamps in the file names if you have to split a GC log into several files. If you use the same format as is used for the -XX:+PrintGCDateStamps, ISO 8601, then lexicographic order, e.g. from file listings, should also be in time sequence order, which would be convenient. (Then, if only we could get PID's to be monotonically increasing. :-) That might imply that you want PID's after (less significant than) time stamps in the split file names, so you still get them in time sequence order. If you want just the logs from PID 1234, you can use "ls *.1234.*" to get just those, in time sequence order. Do you want an option to start a new log file in a sequence after some period of time? E.g., once a day? That might make it easier to line up events across long-running JVM's that are collecting (and therefore generating logs) at different rates. +1 on including the command line arguments (or maybe the settings those provoke inside the VM) in the log file. For settings that change over time, e.g., because of ergonomics, it would be good to have a way to see those, too. Maybe "a new log file format" allows that to happen. ... peter Matt Khan wrote: > > Evening > > we currently manage the log overwriting issue by mv'ing the last gc.log > to gc.log., if you're going to > roll the logs then I would prefer a meaningful suffix rather than just a > counter. > > I second the idea that datestamps should be the default. > > I think a unified, easily parseable but still readable output would be > great though wouldn't you still need a verbose output that is specific > to each collector in order to provide a "debug" level of detail? > > Cheers > Matt > > Matt Khan > -------------------------------------------------- > GFFX Auto Trading > Deutsche Bank, London From doug.jones at internet.co.nz Thu May 6 17:45:59 2010 From: doug.jones at internet.co.nz (Doug Jones) Date: Fri, 7 May 2010 12:45:59 +1200 Subject: Feedback requested: HotSpot GC logging improvements References: <4BE3194E.902@oracle.com> Message-ID: <003001caed7e$c5c22e20$9011b9d2@userf9r7stx6j4> The biggest problem to us is that when the JVM is restarted the previous GC log file is overwritten. So I would like to suggest the following: 1) At midnight each day the GC log is cycled by appending the old day's date (in YYYYMMDD format). So the currently being written to log is always what is specified on gclog. 2) When the JVM is started, if a file of the name specified on gclog exists then it is renamed to the appropriate YYYYMMDD file (preferably taking the date from the file last written to date, not the current date) and a fresh GC log started. If the YYYYMMDD file already exists (ie the JVM has already been restarted that day) then the last GC log would just be appended to that. This would seem to have a number of advantages: it overcomes the problem of the last GC log being overwritten on a restart; the GC log is automatically kept to a manageable size - most often when we are looking at a GC problem we want to see what is in the log over the last few hours; and it means that it is easy for each site to implement its own retention policy, eg deleting old logs after NN days. I don't really see that as being the responsibility of the JVM. I guess it also has the advantage that in the case that DateStamps are not turned on then the first entry in the log for a day will have the TimeStamp for the start of the day, so becomes much easier to work out the time of the day a subsequent event in the log occurred. For the other question our vote is: leave PrintGC Details much as it is and deprecate verbosegc, turn on GCTimeStamps by default but agree not DateStamps (I suspect just getting GCTime is the least overhead system call). Doug. ----- Original Message ----- From: "Tony Printezis" To: Sent: Friday, May 07, 2010 7:32 AM Subject: Feedback requested: HotSpot GC logging improvements > Hi all, > > We would like your input on some changes to HotSpot's GC logging that we > have been discussing. We have been wanting to improve our GC logging for > some time. However we haven't had the resources to spend on it. We don't > know when we'll get to it, but we'd still like to get some feedback on > our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc format > is unhelpful in many ways (i.e., lacks a lot of helpful information). > So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will also > be able to parse the -verbosegc output). The new output will not be what > -XX:+PrintGCDetails generates today but something that can be reliably > parsed and it is also reasonably human-readable (so, no xml and no > space/tab-separated formats). Additionally, we're proposing to enable > -XX:+PrintGCTimeStamps by default (in fact, we'll probably deprecate and > ignore that option, I can't believe that users will really not want a > time stamp per GC log record). We'll leave -XX:+PrintGCDateStamps to be > optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of variable > size, we cannot easily cycle through the log using the same file (I'd > rather not have to overwrite existing records). Our current proposal is > for the user to specify a file number N and a size target S for each > file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will be > helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete the > oldest. So, in the above example, if N == 3, when we create foo.00000004 > we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which has > the same effect. > > Another (related) request has been to maybe append the GC log file name > with the pid of the JVM that's generating it. Maybe we don't want to do > this by default. But, would people find it helpful if we provide a new > cmd line parameter to do that? So, for the above example and assuming > that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the sequence > number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Johann.Loefflmann at Sun.COM Fri May 7 02:05:42 2010 From: Johann.Loefflmann at Sun.COM (Johann N. Loefflmann) Date: Fri, 07 May 2010 11:05:42 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: <4BE3D7E6.7050503@sun.com> Tony, > B. Introducing "cyclic" GC logs. > For the Java Fatal Error Log we can specify %p, for example: -XX:ErrorFile=/var/log/java/java_error%p.log IMHO it would be great if we could use %p also for -Xloggc:/var/log/java/gc%p.log See also http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/felog.html#gbwcy IMHO options in order to configure the format of the gc log filename would be more comfortable for an user than only hardcoding a particular schema. Proposal: %p the process ID %t a timestamp (the format could be controlled by a new option, I suggest -XX:TimeStampFormatForLogFileNames) %n an index or sequential number I suggest a default for the timestamp format, something like -XX:TimeStampFormatForLogFileNames=yyyy-MM-dd_HH-mm-ss because digits, hypen and underscore are file system independent, the format is human readable and the default can be changed by specifying the option if the default is not suitable. Furthermore, the option could be used by any log and not just only the gc log (the fatal error log for example). Pattern letters for the option could be borrowed from the SimpleDateFormat class. See also http://java.sun.com/javase/6/docs/api/java/text/SimpleDateFormat.html I'm sure that customer would find it quite comfortable to specify something like that -Xloggc:/var/log/java/gclog_pid%p_%n_%t.log -Johann (Software TSC Support engineer) From rainer.jung at kippdata.de Thu May 6 22:05:40 2010 From: rainer.jung at kippdata.de (Rainer Jung) Date: Fri, 07 May 2010 07:05:40 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE33C0B.5060009@Oracle.COM> References: <4BE33C0B.5060009@Oracle.COM> Message-ID: <4BE39FA4.10005@kippdata.de> On 07.05.2010 00:00, Peter B. Kessler wrote: > +1 on including the command line arguments (or maybe the settings those provoke inside the VM) in the log file. For settings that change over time, e.g., because of ergonomics, it would be good to have a way to see those, too. Maybe "a new log file format" allows that to happen. +1 From chkwok at digibites.nl Fri May 7 07:54:45 2010 From: chkwok at digibites.nl (Chi Ho Kwok) Date: Fri, 7 May 2010 16:54:45 +0200 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE39FA4.10005@kippdata.de> References: <4BE33C0B.5060009@Oracle.COM> <4BE39FA4.10005@kippdata.de> Message-ID: Talking about a new log format... Is it possible to send the log events to the java.util.logging system too so we can do the whole redirection, write to file or integration in log4j via adapters ourselves, if we choose to? Just dump the string message in a LogRecord and put all parameters into setParameters(), so you can access either the string / formatted message or get a machine readable version by calling getParameters(). Parameters are arrays like ["GCNew", time spent, prev usage, current usage, etc], where the first field defines the type, and the other parameters' interpretation depend on the type. Okay, with this, you can't just add a some command line flags to reconfigure GC logging, but this makes integrating GC related things in apps much easier, you can write a { if concurrent collector failed - send the admin an email that the app stopped for $x seconds, please fix } script in a minute; or if { gc overhead > 10% then maybe you should increase the heap size } trigger. No more writing scripts that parse the output of the gc log to check for weird things. No more parsing at all. Chi Ho Kwok On Fri, May 7, 2010 at 7:05 AM, Rainer Jung wrote: > On 07.05.2010 00:00, Peter B. Kessler wrote: > > +1 on including the command line arguments (or maybe the settings those > provoke inside the VM) in the log file. For settings that change over time, > e.g., because of ergonomics, it would be good to have a way to see those, > too. Maybe "a new log file format" allows that to happen. > > +1 > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100507/08fc5ff5/attachment.html From tony.printezis at oracle.com Fri May 7 09:15:19 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Fri, 07 May 2010 12:15:19 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE3194E.902@oracle.com> References: <4BE3194E.902@oracle.com> Message-ID: <4BE43C97.6000805@oracle.com> Hi all, First, thank you for all the excellent feedback (which I see as mostly positive to the proposals). We are glad that people still care about the GC logs. Instead of replying to individual e-mails, I'll consolidate my replies here. * "I would say that PrintGCDateStamps should be the default" (several folks brought this up) From the point of view of analyzing logs to just look at the GC's behavior, we only need time stamps. And this is the reason why I'd like to see them turned on by default (too many times we got a log without time stamps for which we said "damn, if it had time stamps we'd get a better idea of what was happening"). So, they are the minimum we need to get a good picture of how the GC behaved. Date stamps will increase the size of the log (which still seems to be an issue for some people) and be helpful in fewer places (i.e., when comparing application and GC events; but we generally do not do that). So, you'll have to turn them on yourselves. :-) * "successive runs overwrite the previous log file" (several folks brought this up) Don't you think that adding the JVM's pid to the log file name would eliminate this problem? * "I'm not attached to the old format" (several folks mentioned this) Oh, good. I'll be quoting you when I'll be making a case to remove it. * "A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more expensive than $85/TB" (Ryan) Point taken, but do you really need super duper 10k rpm disks to store GC log files. :-) * "if you're going to roll the logs then I would prefer a meaningful suffix rather than just a counter." A counter seems like a perfectly meaningful suffix to me. * "wouldn't you still need a verbose output that is specific to each collector in order to provide a "debug" level of detail?" (Matt) Very good point. The verbose output will be as unified as possible, but with indeed GC-specific extensions not to lose that information. * "I guess you plan to provide a parser written in Java?" (Rainer) Java? We're HotSpot developers! We only work in C++, assembly, and awk! Just kidding... Yes, indeed in Java. * "so possibly Open Source with a nice license like Apache Software License 2" (Rainer) Maybe, and not up to me to decide. * "f00.00000001 might have been detected as old and copied to the remote host and during the same time GC decides to now reuse it... That's why I personally find externally organized pruning better. Another thing I often miss is the ability to combine size and time based rotation." (Rainer) The proposal never reuses log files. We'll never overwrite anything. Instead, we'll delete the oldest files as we create new ones. If we tell the users to prune the older log files themselves, I know what the first bug filed against the new policy will be. :-) Regarding rotating based on both size and time: most people care about size so I think that's what we'll do. If you want more advanced management of the logs you'll have to set N to infinity (at least we'll need a way to say "never delete older files") so that HotSpot doesn't delete any files and you'll be able to copy them and delete them yourself. But, seriously, this is excellent feedback. You guys are doing more wild stuff with our logs than I had imagined. :-) * "Will you start another discussion about the data contents of the file?" (Rainer) We'll do that separately, based most likely on a wiki. When we get to it. No promises though! * "For more debug detail per collector one could use PrintHeapAtGC" (Michael) Well, PrintHeapAtGC was supposed to be added for debugging purposes, i.e., to find out what the address range of each generation is. However, it has clearer information on how full each generation is which is why people use it today (it's very space inefficient though...). We are hoping to add that information to the standard GC log records to eliminate the need for PrintHeapAtGC. * "In our application we prefix _every_ "cyclic" log file with the config options used to start the app." (Jeff) Adding configuration / whatever information at the top of every log file fragment is an excellent suggestion. Thanks for bringing it up. * "How many digits in the sequence? Would that be configurable?" (Adam) 8 should be more enough (do you really see the need for more than 99m log fragments)? Actually, even 6 will probably be enough. And if we go over that, we won't cycle the numbers, we'll just expand the number field. * "IMHO it would be great if we could use %p also for" (Johann) I was going to say that this would start getting over the top. But I was not aware that you can do that with the fatal error log. I'll need to investigate that further. So, we'll leave this (and additional custom formatting in the GC log name) as a "maybe". :-) I'm not quite sure whether we'd want to use the same facility for the sequence numbers though, given that they'd be needed if we split the log and won't be needed if we don't. For those, I just vote to just add a suffix to the log file name when they are needed. Thanks again for all the good points, Tony, HotSpot GC Group On 5/6/2010 3:32 PM, Tony Printezis wrote: > Hi all, > > We would like your input on some changes to HotSpot's GC logging that > we have been discussing. We have been wanting to improve our GC > logging for some time. However we haven't had the resources to spend > on it. We don't know when we'll get to it, but we'd still like to get > some feedback on our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails > output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc > format is unhelpful in many ways (i.e., lacks a lot of helpful > information). So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will > also be able to parse the -verbosegc output). The new output will not > be what -XX:+PrintGCDetails generates today but something that can be > reliably parsed and it is also reasonably human-readable (so, no xml > and no space/tab-separated formats). Additionally, we're proposing to > enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably > deprecate and ignore that option, I can't believe that users will > really not want a time stamp per GC log record). We'll leave > -XX:+PrintGCDateStamps to be optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of > variable size, we cannot easily cycle through the log using the same > file (I'd rather not have to overwrite existing records). Our current > proposal is for the user to specify a file number N and a size target > S for each file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will > be helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete > the oldest. So, in the above example, if N == 3, when we create > foo.00000004 we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which > has the same effect. > > Another (related) request has been to maybe append the GC log file > name with the pid of the JVM that's generating it. Maybe we don't want > to do this by default. But, would people find it helpful if we provide > a new cmd line parameter to do that? So, for the above example and > assuming that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the > sequence number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > From ryanobjc at gmail.com Fri May 7 14:39:20 2010 From: ryanobjc at gmail.com (Ryan Rawson) Date: Fri, 7 May 2010 14:39:20 -0700 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE43C97.6000805@oracle.com> References: <4BE3194E.902@oracle.com> <4BE43C97.6000805@oracle.com> Message-ID: One last thing to keep in mind is that as you push Java to extremes of performance (I am working on an open source database in Java), the primary factor eventually becomes GC. At this point in our dev cycle, GC considerations dominate all others in any performance oriented decisions. Eventually this ripples down into other areas - like how JNI is too slow and DirectByteBuffers are good, but potentially limited for fine-data access. At this point I have a prod installation that takes 80ms GC pauses every second or more (the object allocation pattern doesnt match the generational hypothesis). Moving data out of the realm of GC into hand-managed is basically our next step. If DirectByteBuffers didn't exist the next step would be writing a lot more JNI or porting away from Java (and all the pain that entails). Thanks to the success of Hadoop, the next area for Java is medium performance systems code. You just would not believe the amount of people writing DB and large data things in Java. Thanks for the attention! -ryan On Fri, May 7, 2010 at 9:15 AM, Tony Printezis wrote: > Hi all, > > First, thank you for all the excellent feedback (which I see as mostly > positive to the proposals). We are glad that people still care about the > GC logs. Instead of replying to individual e-mails, I'll consolidate my > replies here. > > * "I would say that PrintGCDateStamps should be the default" (several > folks brought this up) > > ?From the point of view of analyzing logs to just look at the GC's > behavior, we only need time stamps. And this is the reason why I'd like > to see them turned on by default (too many times we got a log without > time stamps for which we said "damn, if it had time stamps we'd get a > better idea of what was happening"). So, they are the minimum we need to > get a good picture of how the GC behaved. Date stamps will increase the > size of the log (which still seems to be an issue for some people) and > be helpful in fewer places (i.e., when comparing application and GC > events; but we generally do not do that). So, you'll have to turn them > on yourselves. :-) > > * "successive runs overwrite the previous log file" (several folks > brought this up) > > Don't you think that adding the JVM's pid to the log file name would > eliminate this problem? > > * "I'm not attached to the old format" (several folks mentioned this) > > Oh, good. I'll be quoting you when I'll be making a case to remove it. > > * "A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more > expensive than $85/TB" (Ryan) > > Point taken, but do you really need super duper 10k rpm disks to store > GC log files. :-) > > * "if you're going to roll the logs then I would prefer a meaningful > suffix rather than just a counter." > > A counter seems like a perfectly meaningful suffix to me. > > * "wouldn't you still need a verbose output that is specific to each > collector in order to provide a "debug" level of detail?" (Matt) > > Very good point. The verbose output will be as unified as possible, but > with indeed GC-specific extensions not to lose that information. > > * "I guess you plan to provide a parser written in Java?" (Rainer) > > Java? We're HotSpot developers! We only work in C++, assembly, and awk! > Just kidding... Yes, indeed in Java. > > * "so possibly Open Source with a nice license like Apache Software > License 2" (Rainer) > > Maybe, and not up to me to decide. > > * "f00.00000001 might have been detected as old and copied to the remote > host and during the same time GC decides to now reuse it... ?That's why > I personally find externally organized pruning better. Another thing I > often miss is the ability to combine size and time based rotation." (Rainer) > > The proposal never reuses log files. We'll never overwrite anything. > Instead, we'll delete the oldest files as we create new ones. If we tell > the users to prune the older log files themselves, I know what the first > bug filed against the new policy will be. :-) Regarding rotating based > on both size and time: most people care about size so I think that's > what we'll do. If you want more advanced management of the logs you'll > have to set N to infinity (at least we'll need a way to say "never > delete older files") so that HotSpot doesn't delete any files and you'll > be able to copy them and delete them yourself. > > But, seriously, this is excellent feedback. You guys are doing more wild > stuff with our logs than I had imagined. :-) > > * "Will you start another discussion about the data contents of the > file?" (Rainer) > > We'll do that separately, based most likely on a wiki. When we get to > it. No promises though! > > * "For more debug detail per collector one could use PrintHeapAtGC" > (Michael) > > Well, PrintHeapAtGC was supposed to be added for debugging purposes, > i.e., to find out what the address range of each generation is. However, > it has clearer information on how full each generation is which is why > people use it today (it's very space inefficient though...). We are > hoping to add that information to the standard GC log records to > eliminate the need for PrintHeapAtGC. > > * "In our application we prefix _every_ "cyclic" log file with the > config options used to start the app." (Jeff) > > Adding configuration / whatever information at the top of every log file > fragment is an excellent suggestion. Thanks for bringing it up. > > * "How many digits in the sequence? ?Would that be configurable?" (Adam) > > 8 should be more enough (do you really see the need for more than 99m > log fragments)? Actually, even 6 will probably ?be enough. And if we go > over that, we won't cycle the numbers, we'll just expand the number field. > > * "IMHO it would be great if we could use %p also for" (Johann) > > I was going to say that this would start getting over the top. But I was > not aware that you can do that with the fatal error log. I'll need to > investigate that further. So, we'll leave this (and additional custom > formatting in the GC log name) as a "maybe". :-) I'm not quite sure > whether we'd want to use the same facility for the sequence numbers > though, given that they'd be needed if we split the log and won't be > needed if we don't. For those, I just vote to just add a suffix to the > log file name when they are needed. > > Thanks again for all the good points, > > Tony, HotSpot GC Group > > On 5/6/2010 3:32 PM, Tony Printezis wrote: >> Hi all, >> >> We would like your input on some changes to HotSpot's GC logging that >> we have been discussing. We have been wanting to improve our GC >> logging for some time. However we haven't had the resources to spend >> on it. We don't know when we'll get to it, but we'd still like to get >> some feedback on our plans. >> >> The changes fall into two categories. >> >> >> A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails >> output. >> >> I strongly believe that maintaining two GC log formats is >> counter-productive, especially given that the current -verbosegc >> format is unhelpful in many ways (i.e., lacks a lot of helpful >> information). So, we would like to unify the two into one, with maybe >> -XX:+PrintGCDetails generating a superset of what -verbosegc would >> generate (so that a parser for the -XX:+PrintGCDetails output will >> also be able to parse the -verbosegc output). The new output will not >> be what -XX:+PrintGCDetails generates today but something that can be >> reliably parsed and it is also reasonably human-readable (so, no xml >> and no space/tab-separated formats). Additionally, we're proposing to >> enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably >> deprecate and ignore that option, I can't believe that users will >> really not want a time stamp per GC log record). We'll leave >> -XX:+PrintGCDateStamps to be optional though. >> >> Specific questions: >> >> - Is anyone really attached to the old -verbosegc output? >> - Would anyone really hate having time stamps by default? >> - I know that a lot of folks have their own parsers for our current GC >> log formats. Would you be happy if we provided you with a (reliable!) >> parser for the new format in Java that you can easily adapt? >> >> >> B. Introducing "cyclic" GC logs. >> >> This is something that a lot of folks have asked for given that they >> were concerned with the GC logs getting very large (a 1TB disk is $85 >> these days, but anyway...). Given that each GC log record is of >> variable size, we cannot easily cycle through the log using the same >> file (I'd rather not have to overwrite existing records). Our current >> proposal is for the user to specify a file number N and a size target >> S for each file. For a given GC log -Xloggc:foo, HotSpot will generate >> >> foo.00000001 >> foo.00000002 >> foo.00000003 >> etc. >> >> (we'll create a new file as soon as the size of the one we are writing >> to exceeds S, so each file will be slightly larger than S but it will >> be helpful not to split individual log records between two files) >> >> When we create a new file, if we have more than N files we'll delete >> the oldest. So, in the above example, if N == 3, when we create >> foo.00000004 we'll delete foo.00000001. >> >> Note that in the above scheme, the logs are not really "cyclic" but, >> instead, we're pruning the oldest records every now and then, which >> has the same effect. >> >> Another (related) request has been to maybe append the GC log file >> name with the pid of the JVM that's generating it. Maybe we don't want >> to do this by default. But, would people find it helpful if we provide >> a new cmd line parameter to do that? So, for the above example and >> assuming that the JVM's pid is 1234, the GC log file(s) will be either: >> >> foo.1234 >> >> or >> >> foo.1234.00000001 >> foo.1234.00000002 >> foo.1234.00000003 >> etc. >> >> Specific questions: >> >> - Would people really hate it if HotSpot starts appending the GC log >> file name with a (zero-padded) sequence number? Maybe if N == 1 (the >> default), HotSpot will skip the sequence number and ignore S, i.e., >> behave as it does today. >> - To the people who have been asking for cyclic GC logs: is the >> sequence number scheme above good enough? >> >> >> Thanks in advance for your feedback, >> >> Tony, HotSpot GC Group >> >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From vkleinschmidt at gmail.com Mon May 10 13:34:20 2010 From: vkleinschmidt at gmail.com (Volker Kleinschmidt) Date: Mon, 10 May 2010 15:34:20 -0500 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE43C97.6000805@oracle.com> References: <4BE3194E.902@oracle.com> <4BE43C97.6000805@oracle.com> Message-ID: Where is the advantage to using counter numbers in the log file names? If you take the sensible suggestion made by several others here to use ISO datetimestamps in the filenames, you have a natural sequence, no worries about name re-use, and easily automated log maintenance by those that want to keep these logs for a while. You could still implement an auto-deletion of "older" logs for those that want it. Each log can then easily be identified, and an optional PID in the filename would be helpful too. But a counter? What info does that give you by itself, without additional context? None whatsoever. That's why others declared it as "not meaningful". We mainly need gc logs for post-mortem performance problem analysis, so the date/time stamps on the logs would be really handy to identify which log to look at (we often don't get to look at the log on the client system, hence file dates don't help us, and they often don't have PrintGCDateStamps enabled). However the core issue for us is prevention of log overwriting when -Xloggc specifies a fixed filename and the VM gets restarted by the service wrapper watchdog feature, i.e. when you really needed that GC log. So any auto-log-rolling mechanism is much better than none and will make me yodel with joy :^) --Volker On Fri, May 7, 2010 at 11:15 AM, Tony Printezis wrote: > Hi all, > > First, thank you for all the excellent feedback (which I see as mostly > positive to the proposals). We are glad that people still care about the > GC logs. Instead of replying to individual e-mails, I'll consolidate my > replies here. > > * "I would say that PrintGCDateStamps should be the default" (several > folks brought this up) > > ?From the point of view of analyzing logs to just look at the GC's > behavior, we only need time stamps. And this is the reason why I'd like > to see them turned on by default (too many times we got a log without > time stamps for which we said "damn, if it had time stamps we'd get a > better idea of what was happening"). So, they are the minimum we need to > get a good picture of how the GC behaved. Date stamps will increase the > size of the log (which still seems to be an issue for some people) and > be helpful in fewer places (i.e., when comparing application and GC > events; but we generally do not do that). So, you'll have to turn them > on yourselves. :-) > > * "successive runs overwrite the previous log file" (several folks > brought this up) > > Don't you think that adding the JVM's pid to the log file name would > eliminate this problem? > > * "I'm not attached to the old format" (several folks mentioned this) > > Oh, good. I'll be quoting you when I'll be making a case to remove it. > > * "A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more > expensive than $85/TB" (Ryan) > > Point taken, but do you really need super duper 10k rpm disks to store > GC log files. :-) > > * "if you're going to roll the logs then I would prefer a meaningful > suffix rather than just a counter." > > A counter seems like a perfectly meaningful suffix to me. > > * "wouldn't you still need a verbose output that is specific to each > collector in order to provide a "debug" level of detail?" (Matt) > > Very good point. The verbose output will be as unified as possible, but > with indeed GC-specific extensions not to lose that information. > > * "I guess you plan to provide a parser written in Java?" (Rainer) > > Java? We're HotSpot developers! We only work in C++, assembly, and awk! > Just kidding... Yes, indeed in Java. > > * "so possibly Open Source with a nice license like Apache Software > License 2" (Rainer) > > Maybe, and not up to me to decide. > > * "f00.00000001 might have been detected as old and copied to the remote > host and during the same time GC decides to now reuse it... ?That's why > I personally find externally organized pruning better. Another thing I > often miss is the ability to combine size and time based rotation." (Rainer) > > The proposal never reuses log files. We'll never overwrite anything. > Instead, we'll delete the oldest files as we create new ones. If we tell > the users to prune the older log files themselves, I know what the first > bug filed against the new policy will be. :-) Regarding rotating based > on both size and time: most people care about size so I think that's > what we'll do. If you want more advanced management of the logs you'll > have to set N to infinity (at least we'll need a way to say "never > delete older files") so that HotSpot doesn't delete any files and you'll > be able to copy them and delete them yourself. > > But, seriously, this is excellent feedback. You guys are doing more wild > stuff with our logs than I had imagined. :-) > > * "Will you start another discussion about the data contents of the > file?" (Rainer) > > We'll do that separately, based most likely on a wiki. When we get to > it. No promises though! > > * "For more debug detail per collector one could use PrintHeapAtGC" > (Michael) > > Well, PrintHeapAtGC was supposed to be added for debugging purposes, > i.e., to find out what the address range of each generation is. However, > it has clearer information on how full each generation is which is why > people use it today (it's very space inefficient though...). We are > hoping to add that information to the standard GC log records to > eliminate the need for PrintHeapAtGC. > > * "In our application we prefix _every_ "cyclic" log file with the > config options used to start the app." (Jeff) > > Adding configuration / whatever information at the top of every log file > fragment is an excellent suggestion. Thanks for bringing it up. > > * "How many digits in the sequence? ?Would that be configurable?" (Adam) > > 8 should be more enough (do you really see the need for more than 99m > log fragments)? Actually, even 6 will probably ?be enough. And if we go > over that, we won't cycle the numbers, we'll just expand the number field. > > * "IMHO it would be great if we could use %p also for" (Johann) > > I was going to say that this would start getting over the top. But I was > not aware that you can do that with the fatal error log. I'll need to > investigate that further. So, we'll leave this (and additional custom > formatting in the GC log name) as a "maybe". :-) I'm not quite sure > whether we'd want to use the same facility for the sequence numbers > though, given that they'd be needed if we split the log and won't be > needed if we don't. For those, I just vote to just add a suffix to the > log file name when they are needed. > > Thanks again for all the good points, > > Tony, HotSpot GC Group > > On 5/6/2010 3:32 PM, Tony Printezis wrote: >> Hi all, >> >> We would like your input on some changes to HotSpot's GC logging that >> we have been discussing. We have been wanting to improve our GC >> logging for some time. However we haven't had the resources to spend >> on it. We don't know when we'll get to it, but we'd still like to get >> some feedback on our plans. >> >> The changes fall into two categories. >> >> >> A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails >> output. >> >> I strongly believe that maintaining two GC log formats is >> counter-productive, especially given that the current -verbosegc >> format is unhelpful in many ways (i.e., lacks a lot of helpful >> information). So, we would like to unify the two into one, with maybe >> -XX:+PrintGCDetails generating a superset of what -verbosegc would >> generate (so that a parser for the -XX:+PrintGCDetails output will >> also be able to parse the -verbosegc output). The new output will not >> be what -XX:+PrintGCDetails generates today but something that can be >> reliably parsed and it is also reasonably human-readable (so, no xml >> and no space/tab-separated formats). Additionally, we're proposing to >> enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably >> deprecate and ignore that option, I can't believe that users will >> really not want a time stamp per GC log record). We'll leave >> -XX:+PrintGCDateStamps to be optional though. >> >> Specific questions: >> >> - Is anyone really attached to the old -verbosegc output? >> - Would anyone really hate having time stamps by default? >> - I know that a lot of folks have their own parsers for our current GC >> log formats. Would you be happy if we provided you with a (reliable!) >> parser for the new format in Java that you can easily adapt? >> >> >> B. Introducing "cyclic" GC logs. >> >> This is something that a lot of folks have asked for given that they >> were concerned with the GC logs getting very large (a 1TB disk is $85 >> these days, but anyway...). Given that each GC log record is of >> variable size, we cannot easily cycle through the log using the same >> file (I'd rather not have to overwrite existing records). Our current >> proposal is for the user to specify a file number N and a size target >> S for each file. For a given GC log -Xloggc:foo, HotSpot will generate >> >> foo.00000001 >> foo.00000002 >> foo.00000003 >> etc. >> >> (we'll create a new file as soon as the size of the one we are writing >> to exceeds S, so each file will be slightly larger than S but it will >> be helpful not to split individual log records between two files) >> >> When we create a new file, if we have more than N files we'll delete >> the oldest. So, in the above example, if N == 3, when we create >> foo.00000004 we'll delete foo.00000001. >> >> Note that in the above scheme, the logs are not really "cyclic" but, >> instead, we're pruning the oldest records every now and then, which >> has the same effect. >> >> Another (related) request has been to maybe append the GC log file >> name with the pid of the JVM that's generating it. Maybe we don't want >> to do this by default. But, would people find it helpful if we provide >> a new cmd line parameter to do that? So, for the above example and >> assuming that the JVM's pid is 1234, the GC log file(s) will be either: >> >> foo.1234 >> >> or >> >> foo.1234.00000001 >> foo.1234.00000002 >> foo.1234.00000003 >> etc. >> >> Specific questions: >> >> - Would people really hate it if HotSpot starts appending the GC log >> file name with a (zero-padded) sequence number? Maybe if N == 1 (the >> default), HotSpot will skip the sequence number and ignore S, i.e., >> behave as it does today. >> - To the people who have been asking for cyclic GC logs: is the >> sequence number scheme above good enough? >> >> >> Thanks in advance for your feedback, >> >> Tony, HotSpot GC Group >> >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -- Volker Kleinschmidt Senior Support Engineer Blackboard Client Support From tony.printezis at oracle.com Mon May 10 15:41:28 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Mon, 10 May 2010 18:41:28 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: References: <4BE3194E.902@oracle.com> <4BE43C97.6000805@oracle.com> Message-ID: <4BE88B98.8090705@oracle.com> Volker, Volker Kleinschmidt wrote: > Where is the advantage to using counter numbers in the log file names? > If you take the sensible suggestion made by several others here to use > ISO datetimestamps in the filenames, you have a natural sequence, no > worries about name re-use, and easily automated log maintenance by > those that want to keep these logs for a while. You could still > implement an auto-deletion of "older" logs for those that want it. > Each log can then easily be identified, and an optional PID Well, even with date stamps, if you don't have the JVM pid in the file name you won't know which JVM each log came from either. And, if you only have date stamps on the files names, you might not know whether you're missing a file in between the two your customer sent you (you'd need to look at the contents to see whether the time stamps are contiguous or there's a potential hole). Like the time stamps vs. date stamps argument, the sequence number is the minimum you'd need and some folks might want to also enable date stamps in addition to the seq numbers. BTW, something that I just thought of: if we do introduce a way for people to use %n, %d, whatever in the GC log file names, would it also be helpful to have %h for "host name"? > in the > filename would be helpful too. But a counter? What info does that give > you by itself, without additional context? None whatsoever. That's why > others declared it as "not meaningful". > > We mainly need gc logs for post-mortem performance problem analysis, > so the date/time stamps on the logs would be really handy to identify > which log to look at (we often don't get to look at the log on the > client system, hence file dates don't help us, Good point. > and they often don't > have PrintGCDateStamps enabled). However the core issue for us is > prevention of log overwriting when -Xloggc specifies a fixed filename > and the VM gets restarted by the service wrapper watchdog feature, > i.e. when you really needed that GC log. So any auto-log-rolling > mechanism is much better than none and will make me yodel with joy :^) > Yodel? This is almost a good reason to drop that proposal asap. ;-) ;-) ;-) Tony > On Fri, May 7, 2010 at 11:15 AM, Tony Printezis > wrote: > >> Hi all, >> >> First, thank you for all the excellent feedback (which I see as mostly >> positive to the proposals). We are glad that people still care about the >> GC logs. Instead of replying to individual e-mails, I'll consolidate my >> replies here. >> >> * "I would say that PrintGCDateStamps should be the default" (several >> folks brought this up) >> >> From the point of view of analyzing logs to just look at the GC's >> behavior, we only need time stamps. And this is the reason why I'd like >> to see them turned on by default (too many times we got a log without >> time stamps for which we said "damn, if it had time stamps we'd get a >> better idea of what was happening"). So, they are the minimum we need to >> get a good picture of how the GC behaved. Date stamps will increase the >> size of the log (which still seems to be an issue for some people) and >> be helpful in fewer places (i.e., when comparing application and GC >> events; but we generally do not do that). So, you'll have to turn them >> on yourselves. :-) >> >> * "successive runs overwrite the previous log file" (several folks >> brought this up) >> >> Don't you think that adding the JVM's pid to the log file name would >> eliminate this problem? >> >> * "I'm not attached to the old format" (several folks mentioned this) >> >> Oh, good. I'll be quoting you when I'll be making a case to remove it. >> >> * "A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more >> expensive than $85/TB" (Ryan) >> >> Point taken, but do you really need super duper 10k rpm disks to store >> GC log files. :-) >> >> * "if you're going to roll the logs then I would prefer a meaningful >> suffix rather than just a counter." >> >> A counter seems like a perfectly meaningful suffix to me. >> >> * "wouldn't you still need a verbose output that is specific to each >> collector in order to provide a "debug" level of detail?" (Matt) >> >> Very good point. The verbose output will be as unified as possible, but >> with indeed GC-specific extensions not to lose that information. >> >> * "I guess you plan to provide a parser written in Java?" (Rainer) >> >> Java? We're HotSpot developers! We only work in C++, assembly, and awk! >> Just kidding... Yes, indeed in Java. >> >> * "so possibly Open Source with a nice license like Apache Software >> License 2" (Rainer) >> >> Maybe, and not up to me to decide. >> >> * "f00.00000001 might have been detected as old and copied to the remote >> host and during the same time GC decides to now reuse it... That's why >> I personally find externally organized pruning better. Another thing I >> often miss is the ability to combine size and time based rotation." (Rainer) >> >> The proposal never reuses log files. We'll never overwrite anything. >> Instead, we'll delete the oldest files as we create new ones. If we tell >> the users to prune the older log files themselves, I know what the first >> bug filed against the new policy will be. :-) Regarding rotating based >> on both size and time: most people care about size so I think that's >> what we'll do. If you want more advanced management of the logs you'll >> have to set N to infinity (at least we'll need a way to say "never >> delete older files") so that HotSpot doesn't delete any files and you'll >> be able to copy them and delete them yourself. >> >> But, seriously, this is excellent feedback. You guys are doing more wild >> stuff with our logs than I had imagined. :-) >> >> * "Will you start another discussion about the data contents of the >> file?" (Rainer) >> >> We'll do that separately, based most likely on a wiki. When we get to >> it. No promises though! >> >> * "For more debug detail per collector one could use PrintHeapAtGC" >> (Michael) >> >> Well, PrintHeapAtGC was supposed to be added for debugging purposes, >> i.e., to find out what the address range of each generation is. However, >> it has clearer information on how full each generation is which is why >> people use it today (it's very space inefficient though...). We are >> hoping to add that information to the standard GC log records to >> eliminate the need for PrintHeapAtGC. >> >> * "In our application we prefix _every_ "cyclic" log file with the >> config options used to start the app." (Jeff) >> >> Adding configuration / whatever information at the top of every log file >> fragment is an excellent suggestion. Thanks for bringing it up. >> >> * "How many digits in the sequence? Would that be configurable?" (Adam) >> >> 8 should be more enough (do you really see the need for more than 99m >> log fragments)? Actually, even 6 will probably be enough. And if we go >> over that, we won't cycle the numbers, we'll just expand the number field. >> >> * "IMHO it would be great if we could use %p also for" (Johann) >> >> I was going to say that this would start getting over the top. But I was >> not aware that you can do that with the fatal error log. I'll need to >> investigate that further. So, we'll leave this (and additional custom >> formatting in the GC log name) as a "maybe". :-) I'm not quite sure >> whether we'd want to use the same facility for the sequence numbers >> though, given that they'd be needed if we split the log and won't be >> needed if we don't. For those, I just vote to just add a suffix to the >> log file name when they are needed. >> >> Thanks again for all the good points, >> >> Tony, HotSpot GC Group >> >> On 5/6/2010 3:32 PM, Tony Printezis wrote: >> >>> Hi all, >>> >>> We would like your input on some changes to HotSpot's GC logging that >>> we have been discussing. We have been wanting to improve our GC >>> logging for some time. However we haven't had the resources to spend >>> on it. We don't know when we'll get to it, but we'd still like to get >>> some feedback on our plans. >>> >>> The changes fall into two categories. >>> >>> >>> A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails >>> output. >>> >>> I strongly believe that maintaining two GC log formats is >>> counter-productive, especially given that the current -verbosegc >>> format is unhelpful in many ways (i.e., lacks a lot of helpful >>> information). So, we would like to unify the two into one, with maybe >>> -XX:+PrintGCDetails generating a superset of what -verbosegc would >>> generate (so that a parser for the -XX:+PrintGCDetails output will >>> also be able to parse the -verbosegc output). The new output will not >>> be what -XX:+PrintGCDetails generates today but something that can be >>> reliably parsed and it is also reasonably human-readable (so, no xml >>> and no space/tab-separated formats). Additionally, we're proposing to >>> enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably >>> deprecate and ignore that option, I can't believe that users will >>> really not want a time stamp per GC log record). We'll leave >>> -XX:+PrintGCDateStamps to be optional though. >>> >>> Specific questions: >>> >>> - Is anyone really attached to the old -verbosegc output? >>> - Would anyone really hate having time stamps by default? >>> - I know that a lot of folks have their own parsers for our current GC >>> log formats. Would you be happy if we provided you with a (reliable!) >>> parser for the new format in Java that you can easily adapt? >>> >>> >>> B. Introducing "cyclic" GC logs. >>> >>> This is something that a lot of folks have asked for given that they >>> were concerned with the GC logs getting very large (a 1TB disk is $85 >>> these days, but anyway...). Given that each GC log record is of >>> variable size, we cannot easily cycle through the log using the same >>> file (I'd rather not have to overwrite existing records). Our current >>> proposal is for the user to specify a file number N and a size target >>> S for each file. For a given GC log -Xloggc:foo, HotSpot will generate >>> >>> foo.00000001 >>> foo.00000002 >>> foo.00000003 >>> etc. >>> >>> (we'll create a new file as soon as the size of the one we are writing >>> to exceeds S, so each file will be slightly larger than S but it will >>> be helpful not to split individual log records between two files) >>> >>> When we create a new file, if we have more than N files we'll delete >>> the oldest. So, in the above example, if N == 3, when we create >>> foo.00000004 we'll delete foo.00000001. >>> >>> Note that in the above scheme, the logs are not really "cyclic" but, >>> instead, we're pruning the oldest records every now and then, which >>> has the same effect. >>> >>> Another (related) request has been to maybe append the GC log file >>> name with the pid of the JVM that's generating it. Maybe we don't want >>> to do this by default. But, would people find it helpful if we provide >>> a new cmd line parameter to do that? So, for the above example and >>> assuming that the JVM's pid is 1234, the GC log file(s) will be either: >>> >>> foo.1234 >>> >>> or >>> >>> foo.1234.00000001 >>> foo.1234.00000002 >>> foo.1234.00000003 >>> etc. >>> >>> Specific questions: >>> >>> - Would people really hate it if HotSpot starts appending the GC log >>> file name with a (zero-padded) sequence number? Maybe if N == 1 (the >>> default), HotSpot will skip the sequence number and ignore S, i.e., >>> behave as it does today. >>> - To the people who have been asking for cyclic GC logs: is the >>> sequence number scheme above good enough? >>> >>> >>> Thanks in advance for your feedback, >>> >>> Tony, HotSpot GC Group >>> >>> >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > > > > From martin at attivio.com Tue May 11 07:13:41 2010 From: martin at attivio.com (Martin Serrano) Date: Tue, 11 May 2010 10:13:41 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <4BE43C97.6000805@oracle.com> References: <4BE3194E.902@oracle.com> <4BE43C97.6000805@oracle.com> Message-ID: <9694A6C3D68A4249BD9E1A875B6BA81E105CCD27@bos0ex01.corp.attivio.com> Tony, Love the ideas. We start java via a wrapper process and we currently set the gc log file name using a timestamp. In upcoming releases we are planning on augmenting that with a meaningful application name. Specific comments on your response: > * "if you're going to roll the logs then I would prefer a meaningful > suffix rather than just a counter." > > A counter seems like a perfectly meaningful suffix to me. I would prefer to have a consistent suffix (like .log), in the filename. Perhaps you could support just the %d format for the counter in the generated log name. We'd also appreciate having startup information at the top of the gc log. Cheers, Martin -----Original Message----- From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Tony Printezis Sent: Friday, May 07, 2010 12:15 PM To: hotspot-gc-use at openjdk.java.net Subject: Re: Feedback requested: HotSpot GC logging improvements Hi all, First, thank you for all the excellent feedback (which I see as mostly positive to the proposals). We are glad that people still care about the GC logs. Instead of replying to individual e-mails, I'll consolidate my replies here. * "I would say that PrintGCDateStamps should be the default" (several folks brought this up) From the point of view of analyzing logs to just look at the GC's behavior, we only need time stamps. And this is the reason why I'd like to see them turned on by default (too many times we got a log without time stamps for which we said "damn, if it had time stamps we'd get a better idea of what was happening"). So, they are the minimum we need to get a good picture of how the GC behaved. Date stamps will increase the size of the log (which still seems to be an issue for some people) and be helpful in fewer places (i.e., when comparing application and GC events; but we generally do not do that). So, you'll have to turn them on yourselves. :-) * "successive runs overwrite the previous log file" (several folks brought this up) Don't you think that adding the JVM's pid to the log file name would eliminate this problem? * "I'm not attached to the old format" (several folks mentioned this) Oh, good. I'll be quoting you when I'll be making a case to remove it. * "A serial-attached-scsi (aka SAS) disk at 10k rpm is a little bit more expensive than $85/TB" (Ryan) Point taken, but do you really need super duper 10k rpm disks to store GC log files. :-) * "if you're going to roll the logs then I would prefer a meaningful suffix rather than just a counter." A counter seems like a perfectly meaningful suffix to me. * "wouldn't you still need a verbose output that is specific to each collector in order to provide a "debug" level of detail?" (Matt) Very good point. The verbose output will be as unified as possible, but with indeed GC-specific extensions not to lose that information. * "I guess you plan to provide a parser written in Java?" (Rainer) Java? We're HotSpot developers! We only work in C++, assembly, and awk! Just kidding... Yes, indeed in Java. * "so possibly Open Source with a nice license like Apache Software License 2" (Rainer) Maybe, and not up to me to decide. * "f00.00000001 might have been detected as old and copied to the remote host and during the same time GC decides to now reuse it... That's why I personally find externally organized pruning better. Another thing I often miss is the ability to combine size and time based rotation." (Rainer) The proposal never reuses log files. We'll never overwrite anything. Instead, we'll delete the oldest files as we create new ones. If we tell the users to prune the older log files themselves, I know what the first bug filed against the new policy will be. :-) Regarding rotating based on both size and time: most people care about size so I think that's what we'll do. If you want more advanced management of the logs you'll have to set N to infinity (at least we'll need a way to say "never delete older files") so that HotSpot doesn't delete any files and you'll be able to copy them and delete them yourself. But, seriously, this is excellent feedback. You guys are doing more wild stuff with our logs than I had imagined. :-) * "Will you start another discussion about the data contents of the file?" (Rainer) We'll do that separately, based most likely on a wiki. When we get to it. No promises though! * "For more debug detail per collector one could use PrintHeapAtGC" (Michael) Well, PrintHeapAtGC was supposed to be added for debugging purposes, i.e., to find out what the address range of each generation is. However, it has clearer information on how full each generation is which is why people use it today (it's very space inefficient though...). We are hoping to add that information to the standard GC log records to eliminate the need for PrintHeapAtGC. * "In our application we prefix _every_ "cyclic" log file with the config options used to start the app." (Jeff) Adding configuration / whatever information at the top of every log file fragment is an excellent suggestion. Thanks for bringing it up. * "How many digits in the sequence? Would that be configurable?" (Adam) 8 should be more enough (do you really see the need for more than 99m log fragments)? Actually, even 6 will probably be enough. And if we go over that, we won't cycle the numbers, we'll just expand the number field. * "IMHO it would be great if we could use %p also for" (Johann) I was going to say that this would start getting over the top. But I was not aware that you can do that with the fatal error log. I'll need to investigate that further. So, we'll leave this (and additional custom formatting in the GC log name) as a "maybe". :-) I'm not quite sure whether we'd want to use the same facility for the sequence numbers though, given that they'd be needed if we split the log and won't be needed if we don't. For those, I just vote to just add a suffix to the log file name when they are needed. Thanks again for all the good points, Tony, HotSpot GC Group On 5/6/2010 3:32 PM, Tony Printezis wrote: > Hi all, > > We would like your input on some changes to HotSpot's GC logging that > we have been discussing. We have been wanting to improve our GC > logging for some time. However we haven't had the resources to spend > on it. We don't know when we'll get to it, but we'd still like to get > some feedback on our plans. > > The changes fall into two categories. > > > A. Unification and improvement of -verbosegc / -XX:+PrintGCDetails > output. > > I strongly believe that maintaining two GC log formats is > counter-productive, especially given that the current -verbosegc > format is unhelpful in many ways (i.e., lacks a lot of helpful > information). So, we would like to unify the two into one, with maybe > -XX:+PrintGCDetails generating a superset of what -verbosegc would > generate (so that a parser for the -XX:+PrintGCDetails output will > also be able to parse the -verbosegc output). The new output will not > be what -XX:+PrintGCDetails generates today but something that can be > reliably parsed and it is also reasonably human-readable (so, no xml > and no space/tab-separated formats). Additionally, we're proposing to > enable -XX:+PrintGCTimeStamps by default (in fact, we'll probably > deprecate and ignore that option, I can't believe that users will > really not want a time stamp per GC log record). We'll leave > -XX:+PrintGCDateStamps to be optional though. > > Specific questions: > > - Is anyone really attached to the old -verbosegc output? > - Would anyone really hate having time stamps by default? > - I know that a lot of folks have their own parsers for our current GC > log formats. Would you be happy if we provided you with a (reliable!) > parser for the new format in Java that you can easily adapt? > > > B. Introducing "cyclic" GC logs. > > This is something that a lot of folks have asked for given that they > were concerned with the GC logs getting very large (a 1TB disk is $85 > these days, but anyway...). Given that each GC log record is of > variable size, we cannot easily cycle through the log using the same > file (I'd rather not have to overwrite existing records). Our current > proposal is for the user to specify a file number N and a size target > S for each file. For a given GC log -Xloggc:foo, HotSpot will generate > > foo.00000001 > foo.00000002 > foo.00000003 > etc. > > (we'll create a new file as soon as the size of the one we are writing > to exceeds S, so each file will be slightly larger than S but it will > be helpful not to split individual log records between two files) > > When we create a new file, if we have more than N files we'll delete > the oldest. So, in the above example, if N == 3, when we create > foo.00000004 we'll delete foo.00000001. > > Note that in the above scheme, the logs are not really "cyclic" but, > instead, we're pruning the oldest records every now and then, which > has the same effect. > > Another (related) request has been to maybe append the GC log file > name with the pid of the JVM that's generating it. Maybe we don't want > to do this by default. But, would people find it helpful if we provide > a new cmd line parameter to do that? So, for the above example and > assuming that the JVM's pid is 1234, the GC log file(s) will be either: > > foo.1234 > > or > > foo.1234.00000001 > foo.1234.00000002 > foo.1234.00000003 > etc. > > Specific questions: > > - Would people really hate it if HotSpot starts appending the GC log > file name with a (zero-padded) sequence number? Maybe if N == 1 (the > default), HotSpot will skip the sequence number and ignore S, i.e., > behave as it does today. > - To the people who have been asking for cyclic GC logs: is the > sequence number scheme above good enough? > > > Thanks in advance for your feedback, > > Tony, HotSpot GC Group > > _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From tony.printezis at oracle.com Tue May 11 07:33:49 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Tue, 11 May 2010 10:33:49 -0400 Subject: Feedback requested: HotSpot GC logging improvements In-Reply-To: <9694A6C3D68A4249BD9E1A875B6BA81E105CCD27@bos0ex01.corp.attivio.com> References: <4BE3194E.902@oracle.com> <4BE43C97.6000805@oracle.com> <9694A6C3D68A4249BD9E1A875B6BA81E105CCD27@bos0ex01.corp.attivio.com> Message-ID: <4BE96ACD.2070805@oracle.com> Martin, Hi, thanks for the feedback. Martin Serrano wrote: > I would prefer to have a consistent suffix (like .log), in the filename. Perhaps you could support just the %d format for the counter in the generated log name. > Well, if you allow parameters in the log file name, like 'foo.%d.%n.log' then folks can give their one suffix. I don't think we want to start adding one... Tony From matt.fowles at gmail.com Wed May 12 15:19:30 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Wed, 12 May 2010 18:19:30 -0400 Subject: Growing GC Young Gen Times Message-ID: All~ I have a large app that produces ~4g of garbage every 30 seconds and am trying to reduce the size of gc outliers. About 99% of this data is garbage, but almost anything that survives one collection survives for an indeterminately long amount of time. We are currently using the following VM and options: java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintCommandLineFlags -XX:+PrintReferenceGC -Xms32g -Xmx32g -Xmn4g -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=4 -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:MaxGCPauseMillis=50 -Xloggc:gc.log As you can see from the GC log, we never actually reach the point where the CMS kicks in (after app startup). But our young gens seem to take increasingly long to collect as time goes by. The steady state of the app is reached around 956.392 into the log with a collection that takes 0.106 seconds. Thereafter the survivor space remains roughly constantly as filled and the amount promoted to old gen also remains constant, but the collection times increase to 2.855 seconds by the end of the 3.5 hour run. Has anyone seen this sort of behavior before? Are there more switches that I should try running with? Obviously, I am working to profile the app and reduce the garbage load in parallel. But if I still see this sort of problem, it is only a question of how long must the app run before I see unacceptable latency spikes. Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: gc.log.gz Type: application/x-gzip Size: 48564 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100512/df7b7e27/attachment-0001.bin From wmadams824 at comcast.net Wed May 12 16:02:47 2010 From: wmadams824 at comcast.net (wmadams824 at comcast.net) Date: Wed, 12 May 2010 23:02:47 +0000 (UTC) Subject: Growing GC Young Gen Times In-Reply-To: Message-ID: <67188202.18794611273705367501.JavaMail.root@sz0070a.emeryville.ca.mail.comcast.net> Hi, Matt: I don't have a solution but I can add more information, as I recently dealt with the same issue. My customer's app also generated a large amount of garbage per unit time. We were using CMS (with most of the same options you are using, but a smaller heap) and saw the same behavior of the young GC time increasing monotonically. Some additional information: their app did experience a number of CMS collections per day, and the young-GC collection time still continued to rise. We also arranged for a Full GC to occur in the middle of the night (to see if fragmentation in the old generation was impacting young-GC collection times), and still saw no change in the ever-increasing time for young GC. Printing FLS stats at level 2 showed no noticeable fragmentation in the old or perm gens. The only thing that caused the young GC times to go down was to bounce the JVM (at which point they began rising again, of course). While I was there, the server ran for a max of 3 or 4 days between bounces, and the young-GC collection time never leveled off. At least that's where they were when they threw me out of the office. :) Regards, Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100512/188f08f7/attachment.html From chkwok at digibites.nl Wed May 12 16:35:17 2010 From: chkwok at digibites.nl (Chi Ho Kwok) Date: Thu, 13 May 2010 01:35:17 +0200 Subject: Growing GC Young Gen Times In-Reply-To: References: Message-ID: Hi Matt, You're having a 1-2s break every ~30 seconds. It's more efficient to collect more things in 1 pass, but if you want smaller, more frequent delays, try lowering the heap sizes. As you never reach more than 8G used memory before the pauses get too insane, try the following setting: -Xmx8g -Xmn1g This way, the maximum amount of data the ParNew collector has to process is only 1/4 of the original, so the worst case delay is reduced by 75%. And unless you're leaking memory somewhere, the old generation heap should contain only long living objects, like data files that should stay loaded. The way it looks now is that the whole old generation is filled with garbage, but as you never reach 60% used, it's never collected. I'd drop the MaxGCPauseMillis too, the goal is unreachable anyway. You can't collect a heap of a few GB in 50ms. Our setup here is pretty similar, the current heap size at least, but our memory allocation pressure is much higher; plus, with a LRU cache as large as the java heap allows, we constantly need to promote things from young to old, and collect the old generation with CMS to free up space for more data. This is how we tuned it: Basics: -Xmx32g -Xms32g -Xmn1500m Voodoo: 4 threads, CMSInitiatingOccupancyFraction 76%, MaxTenuringThreshold 1 (yes, promote after 1 copy max, LRU means if it sticks around for 10s, it stays), SurvivorRatio=2 (prevents overflow directly into old gen) We collect the young gen about once every 3 to 5 seconds during peaks, example line: 2010-05-12T23:05:55.253+0200: 1496100.414: [GC 1496100.414: [ParNew: 1077270K->318127K(1152000K), 0.3551790 secs] 20722984K->20150315K(33170432K), 0.3554220 secs] [Times: user=1.57 sys=0.02, real=0.36 secs] So, to scan, promote, about 1GB of data, we use 0.4s. If we used a 4G new generation, it could be as bad as your logs, yes; 0.4s x4 = 1.6s. Too bad we can't get the minimum delay any lower; with our allocation rate, if we reduce the new generation size, there's a good chance that a lot of temporary data leaks through to the old generation, which is much, much more expensive to collect. With the current size, it already means that data held for more than 2x the collect delay, or ~8 seconds, leaks through to the old gen, even if it isn't supposed to be - only data in the LRU cache should be there. With your allocation rate of 4G/30s = 136M/sec, you can play with sizes as small as 512m and just let some objects with a ~10s+ lifetime leak through to the old gen - CMS does it work in the background anyway, so if you want to minimize pauses and have spare CPU cycles, go for a tiny new generation. Chi Ho On Thu, May 13, 2010 at 12:19 AM, Matt Fowles wrote: > All~ > > I have a large app that produces ~4g of garbage every 30 seconds and > am trying to reduce the size of gc outliers. About 99% of this data > is garbage, but almost anything that survives one collection survives > for an indeterminately long amount of time. We are currently using > the following VM and options: > > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintCommandLineFlags > -XX:+PrintReferenceGC > -Xms32g -Xmx32g -Xmn4g > -XX:+UseParNewGC > -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=4 > -XX:CMSInitiatingOccupancyFraction=60 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled > -XX:MaxGCPauseMillis=50 > -Xloggc:gc.log > > > As you can see from the GC log, we never actually reach the point > where the CMS kicks in (after app startup). But our young gens seem > to take increasingly long to collect as time goes by. > > The steady state of the app is reached around 956.392 into the log > with a collection that takes 0.106 seconds. Thereafter the survivor > space remains roughly constantly as filled and the amount promoted to > old gen also remains constant, but the collection times increase to > 2.855 seconds by the end of the 3.5 hour run. > > Has anyone seen this sort of behavior before? Are there more switches > that I should try running with? > > Obviously, I am working to profile the app and reduce the garbage load > in parallel. But if I still see this sort of problem, it is only a > question of how long must the app run before I see unacceptable > latency spikes. > > Matt > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100513/49a02491/attachment.html From y.s.ramakrishna at oracle.com Wed May 12 16:38:51 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Wed, 12 May 2010 16:38:51 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: Message-ID: <4BEB3C0B.6040804@oracle.com> Try the jvm from hs18 (jdk 7) and let us know what you see. Or wait for JDK 6u21 which (i think) is slated for sometime next month. Or get an hs17 JVM with the fix 6631166 via your Java support and give it a try. Also add -XX:+UseLargePages -XX:+AlwaysPreTouch and if you have enough cores try increasing yr ParallelGCThreads from your current setting of 4 (what was the default you got?). I have not looked at the log you sent, but can take a look when i get some time; but no promises, as i am drowned in other work at the moment. -- ramki On 05/12/10 15:19, Matt Fowles wrote: > All~ > > I have a large app that produces ~4g of garbage every 30 seconds and > am trying to reduce the size of gc outliers. About 99% of this data > is garbage, but almost anything that survives one collection survives > for an indeterminately long amount of time. We are currently using > the following VM and options: > > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintCommandLineFlags > -XX:+PrintReferenceGC > -Xms32g -Xmx32g -Xmn4g > -XX:+UseParNewGC > -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=4 > -XX:CMSInitiatingOccupancyFraction=60 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled > -XX:MaxGCPauseMillis=50 > -Xloggc:gc.log > > > As you can see from the GC log, we never actually reach the point > where the CMS kicks in (after app startup). But our young gens seem > to take increasingly long to collect as time goes by. > > The steady state of the app is reached around 956.392 into the log > with a collection that takes 0.106 seconds. Thereafter the survivor > space remains roughly constantly as filled and the amount promoted to > old gen also remains constant, but the collection times increase to > 2.855 seconds by the end of the 3.5 hour run. > > Has anyone seen this sort of behavior before? Are there more switches > that I should try running with? > > Obviously, I am working to profile the app and reduce the garbage load > in parallel. But if I still see this sort of problem, it is only a > question of how long must the app run before I see unacceptable > latency spikes. > > Matt > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Thu May 13 09:23:18 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 13 May 2010 09:23:18 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: Message-ID: <4BEC2776.8010609@oracle.com> Matt, As Ramki indicated fragmentation might be an issue. As the fragmentation in the old generation increases, it takes longer to find space in the old generation into which to promote objects from the young generation. This is apparently not the problem that Wayne is having but you still might be hitting it. If you can connect jconsole to the VM and force a full GC, that would tell us if it's fragmentation. There might be a scaling issue with the UseParNewGC. If you can use -XX:-UseParNewGC (turning off the parallel young generation collection) with -XX:+UseConcMarkSweepGC the pauses will be longer but may be more stable. That's not the solution but just part of the investigation. You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC and if you don't see the growing young generation pause, that would indicate something specific about promotion into the CMS generation. UseParallelGC is different from UseParNewGC in a number of ways and if you try UseParallelGC and still see the growing young generation pauses, I'd suspect something special about your application. If you can run these experiments hopefully they will tell us where to look next. Jon On 05/12/10 15:19, Matt Fowles wrote: > All~ > > I have a large app that produces ~4g of garbage every 30 seconds and > am trying to reduce the size of gc outliers. About 99% of this data > is garbage, but almost anything that survives one collection survives > for an indeterminately long amount of time. We are currently using > the following VM and options: > > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintCommandLineFlags > -XX:+PrintReferenceGC > -Xms32g -Xmx32g -Xmn4g > -XX:+UseParNewGC > -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=4 > -XX:CMSInitiatingOccupancyFraction=60 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled > -XX:MaxGCPauseMillis=50 > -Xloggc:gc.log > > > As you can see from the GC log, we never actually reach the point > where the CMS kicks in (after app startup). But our young gens seem > to take increasingly long to collect as time goes by. > > The steady state of the app is reached around 956.392 into the log > with a collection that takes 0.106 seconds. Thereafter the survivor > space remains roughly constantly as filled and the amount promoted to > old gen also remains constant, but the collection times increase to > 2.855 seconds by the end of the 3.5 hour run. > > Has anyone seen this sort of behavior before? Are there more switches > that I should try running with? > > Obviously, I am working to profile the app and reduce the garbage load > in parallel. But if I still see this sort of problem, it is only a > question of how long must the app run before I see unacceptable > latency spikes. > > Matt > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100513/290423f6/attachment-0001.html From matt.fowles at gmail.com Thu May 13 10:50:32 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Thu, 13 May 2010 13:50:32 -0400 Subject: Growing GC Young Gen Times In-Reply-To: <4BEC2776.8010609@oracle.com> References: <4BEC2776.8010609@oracle.com> Message-ID: Jon~ This may sound naive, but how can fragmentation be an issue if the old gen has never been collected? I would think we are still in the space where we can just bump the old gen alloc pointer... Matt On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu wrote: > Matt, > > As Ramki indicated fragmentation might be an issue.? As the fragmentation > in the old generation increases, it takes longer to find space in the old > generation > into which to promote objects from the young generation.? This is apparently > not > the problem that Wayne is having but you still might be hitting it.? If you > can > connect jconsole to the VM and force a full GC, that would tell us if it's > fragmentation. > > There might be a scaling issue with the UseParNewGC.? If you can use > -XX:-UseParNewGC (turning off the parallel young > generation collection) with? -XX:+UseConcMarkSweepGC the pauses > will be longer but may be more stable.? That's not the solution but just > part > of the investigation. > > You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC > and if you don't see the growing young generation pause, that would indicate > something specific about promotion into the CMS generation. > > UseParallelGC is different from UseParNewGC in a number of ways > and if you try UseParallelGC and still see the growing young generation > pauses, I'd suspect something special about your application. > > If you can run these experiments hopefully they will tell > us where to look next. > > Jon > > > On 05/12/10 15:19, Matt Fowles wrote: > > All~ > > I have a large app that produces ~4g of garbage every 30 seconds and > am trying to reduce the size of gc outliers. About 99% of this data > is garbage, but almost anything that survives one collection survives > for an indeterminately long amount of time. We are currently using > the following VM and options: > > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintCommandLineFlags > -XX:+PrintReferenceGC > -Xms32g -Xmx32g -Xmn4g > -XX:+UseParNewGC > -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=4 > -XX:CMSInitiatingOccupancyFraction=60 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled > -XX:MaxGCPauseMillis=50 > -Xloggc:gc.log > > > As you can see from the GC log, we never actually reach the point > where the CMS kicks in (after app startup). But our young gens seem > to take increasingly long to collect as time goes by. > > The steady state of the app is reached around 956.392 into the log > with a collection that takes 0.106 seconds. Thereafter the survivor > space remains roughly constantly as filled and the amount promoted to > old gen also remains constant, but the collection times increase to > 2.855 seconds by the end of the 3.5 hour run. > > Has anyone seen this sort of behavior before? Are there more switches > that I should try running with? > > Obviously, I am working to profile the app and reduce the garbage load > in parallel. But if I still see this sort of problem, it is only a > question of how long must the app run before I see unacceptable > latency spikes. > > Matt > > ________________________________ > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Thu May 13 14:52:24 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Thu, 13 May 2010 14:52:24 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BEC2776.8010609@oracle.com> Message-ID: <4BEC7498.6030405@oracle.com> On 05/13/10 10:50, Matt Fowles wrote: > Jon~ > > This may sound naive, but how can fragmentation be an issue if the old > gen has never been collected? I would think we are still in the space > where we can just bump the old gen alloc pointer... Matt, The old gen allocator may fragment the space. Allocation is not exactly "bump a pointer". -- ramki > > Matt > > On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu > wrote: >> Matt, >> >> As Ramki indicated fragmentation might be an issue. As the fragmentation >> in the old generation increases, it takes longer to find space in the old >> generation >> into which to promote objects from the young generation. This is apparently >> not >> the problem that Wayne is having but you still might be hitting it. If you >> can >> connect jconsole to the VM and force a full GC, that would tell us if it's >> fragmentation. >> >> There might be a scaling issue with the UseParNewGC. If you can use >> -XX:-UseParNewGC (turning off the parallel young >> generation collection) with -XX:+UseConcMarkSweepGC the pauses >> will be longer but may be more stable. That's not the solution but just >> part >> of the investigation. >> >> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >> and if you don't see the growing young generation pause, that would indicate >> something specific about promotion into the CMS generation. >> >> UseParallelGC is different from UseParNewGC in a number of ways >> and if you try UseParallelGC and still see the growing young generation >> pauses, I'd suspect something special about your application. >> >> If you can run these experiments hopefully they will tell >> us where to look next. >> >> Jon >> >> >> On 05/12/10 15:19, Matt Fowles wrote: >> >> All~ >> >> I have a large app that produces ~4g of garbage every 30 seconds and >> am trying to reduce the size of gc outliers. About 99% of this data >> is garbage, but almost anything that survives one collection survives >> for an indeterminately long amount of time. We are currently using >> the following VM and options: >> >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> -verbose:gc >> -XX:+PrintGCTimeStamps >> -XX:+PrintGCDetails >> -XX:+PrintGCTaskTimeStamps >> -XX:+PrintTenuringDistribution >> -XX:+PrintCommandLineFlags >> -XX:+PrintReferenceGC >> -Xms32g -Xmx32g -Xmn4g >> -XX:+UseParNewGC >> -XX:ParallelGCThreads=4 >> -XX:+UseConcMarkSweepGC >> -XX:ParallelCMSThreads=4 >> -XX:CMSInitiatingOccupancyFraction=60 >> -XX:+UseCMSInitiatingOccupancyOnly >> -XX:+CMSParallelRemarkEnabled >> -XX:MaxGCPauseMillis=50 >> -Xloggc:gc.log >> >> >> As you can see from the GC log, we never actually reach the point >> where the CMS kicks in (after app startup). But our young gens seem >> to take increasingly long to collect as time goes by. >> >> The steady state of the app is reached around 956.392 into the log >> with a collection that takes 0.106 seconds. Thereafter the survivor >> space remains roughly constantly as filled and the amount promoted to >> old gen also remains constant, but the collection times increase to >> 2.855 seconds by the end of the 3.5 hour run. >> >> Has anyone seen this sort of behavior before? Are there more switches >> that I should try running with? >> >> Obviously, I am working to profile the app and reduce the garbage load >> in parallel. But if I still see this sort of problem, it is only a >> question of how long must the app run before I see unacceptable >> latency spikes. >> >> Matt >> >> ________________________________ >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Thu May 13 15:29:33 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 13 May 2010 15:29:33 -0700 Subject: Growing GC Young Gen Times In-Reply-To: <4BEC7498.6030405@oracle.com> References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> Message-ID: <4BEC7D4D.2000905@oracle.com> Matt, To amplify on Ramki's comment, the allocations out of the old generation are always from a free list. During a young generation collection each GC thread will get its own local free lists from the old generation so that it can copy objects to the old generation without synchronizing with the other GC thread (most of the time). Objects from a GC thread's local free lists are pushed to the globals lists after the collection (as far as I recall). So there is some churn in the free lists. Jon On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: > On 05/13/10 10:50, Matt Fowles wrote: >> Jon~ >> >> This may sound naive, but how can fragmentation be an issue if the old >> gen has never been collected? I would think we are still in the space >> where we can just bump the old gen alloc pointer... > > Matt, The old gen allocator may fragment the space. Allocation is not > exactly "bump a pointer". > > -- ramki > >> >> Matt >> >> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >> wrote: >>> Matt, >>> >>> As Ramki indicated fragmentation might be an issue. As the >>> fragmentation >>> in the old generation increases, it takes longer to find space in >>> the old >>> generation >>> into which to promote objects from the young generation. This is >>> apparently >>> not >>> the problem that Wayne is having but you still might be hitting it. >>> If you >>> can >>> connect jconsole to the VM and force a full GC, that would tell us >>> if it's >>> fragmentation. >>> >>> There might be a scaling issue with the UseParNewGC. If you can use >>> -XX:-UseParNewGC (turning off the parallel young >>> generation collection) with -XX:+UseConcMarkSweepGC the pauses >>> will be longer but may be more stable. That's not the solution but >>> just >>> part >>> of the investigation. >>> >>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>> and if you don't see the growing young generation pause, that would >>> indicate >>> something specific about promotion into the CMS generation. >>> >>> UseParallelGC is different from UseParNewGC in a number of ways >>> and if you try UseParallelGC and still see the growing young generation >>> pauses, I'd suspect something special about your application. >>> >>> If you can run these experiments hopefully they will tell >>> us where to look next. >>> >>> Jon >>> >>> >>> On 05/12/10 15:19, Matt Fowles wrote: >>> >>> All~ >>> >>> I have a large app that produces ~4g of garbage every 30 seconds and >>> am trying to reduce the size of gc outliers. About 99% of this data >>> is garbage, but almost anything that survives one collection survives >>> for an indeterminately long amount of time. We are currently using >>> the following VM and options: >>> >>> java version "1.6.0_20" >>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>> >>> -verbose:gc >>> -XX:+PrintGCTimeStamps >>> -XX:+PrintGCDetails >>> -XX:+PrintGCTaskTimeStamps >>> -XX:+PrintTenuringDistribution >>> -XX:+PrintCommandLineFlags >>> -XX:+PrintReferenceGC >>> -Xms32g -Xmx32g -Xmn4g >>> -XX:+UseParNewGC >>> -XX:ParallelGCThreads=4 >>> -XX:+UseConcMarkSweepGC >>> -XX:ParallelCMSThreads=4 >>> -XX:CMSInitiatingOccupancyFraction=60 >>> -XX:+UseCMSInitiatingOccupancyOnly >>> -XX:+CMSParallelRemarkEnabled >>> -XX:MaxGCPauseMillis=50 >>> -Xloggc:gc.log >>> >>> >>> As you can see from the GC log, we never actually reach the point >>> where the CMS kicks in (after app startup). But our young gens seem >>> to take increasingly long to collect as time goes by. >>> >>> The steady state of the app is reached around 956.392 into the log >>> with a collection that takes 0.106 seconds. Thereafter the survivor >>> space remains roughly constantly as filled and the amount promoted to >>> old gen also remains constant, but the collection times increase to >>> 2.855 seconds by the end of the 3.5 hour run. >>> >>> Has anyone seen this sort of behavior before? Are there more switches >>> that I should try running with? >>> >>> Obviously, I am working to profile the app and reduce the garbage load >>> in parallel. But if I still see this sort of problem, it is only a >>> question of how long must the app run before I see unacceptable >>> latency spikes. >>> >>> Matt >>> >>> ________________________________ >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From ryanobjc at gmail.com Thu May 13 16:54:34 2010 From: ryanobjc at gmail.com (Ryan Rawson) Date: Thu, 13 May 2010 16:54:34 -0700 Subject: Growing GC Young Gen Times In-Reply-To: <4BEC7D4D.2000905@oracle.com> References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> Message-ID: Hi, I have had a similar experience and I ran into a reasonable and unsatisfactory solution... In my case, the young GC times keeps on taking increasingly longer and longer. Doing GC logging I saw something similar to what you saw - the ParNew keeps on growing and the amount of data being tenured was huge. Having a 800ms GC pause every 4 seconds was no good for me. I eventually added this to my java start: "-XX:NewSize=64m -XX:MaxNewSize=64m" A friend suggested this to me - he said that a young GC should be fast because the young gen should be ~ the size of L3 cache. With this setting I see YoungGCs between .5-3 times a second and they last between 10-80ms or so. The CMS will prune massive amounts of garbage out when it runs, up to 2GB ram in my 6GB heap processes. Now for a little theorycrafting... The root cause here is my application breaks the Object Generational Hypothesis. The GC auto-tuning will grow the ParNew to reduce the amount of data it is tenuring, but it is never really able to reach a good steady state. At this point you are now tenuring 1-2GB of ram. Tenuring = copying objects = time consuming. Once you are at this spot, you find out that every current shipping GC is just not good enough. My hope was to use G1, but considering how unstable it was for me (I have tried 12+ releases of Java7, a few releases of Java6) I am now shifting my approach. In my application one of the primary causes of allocation is a block cache for a database-type application. I am planning on testing a change where the block cache is maintained in massive DirectByteBuffers (think sizes from 2-15GB of RAM) and I will manage all the allocation by hand (in Java). If you have some ability to shift memory usage out of the domain of the GC I would highly suggest doing so. At this point I can honestly say if you are not Object Generational Hypothesis Compliant (OGH (tm)) then Java for large heaps can be very very painful. I think the choices are DirectByteBuffer, JNI, and not using Java. I'd like an option to that, but I'm not sure what it might be while avoiding that last option (also avoiding JNI too ideally). I feel this is the greatest weakness of Java - the memory management is 1 size fits all, and there are few great options. DirectByteBuffers have a limited interface and require copying data in and out to talk to the rest of Java. JNI has the same issue and has had a high invokation cost. Good luck out there, and stay OGH compliant! -ryan On Thu, May 13, 2010 at 3:29 PM, Jon Masamitsu wrote: > Matt, > > To amplify on Ramki's comment, the allocations out of the > old generation are always from a free list. ?During a young > generation collection each GC thread will get its own > local free lists from the old generation so that it can > copy objects to the old generation without synchronizing > with the other GC thread (most of the time). ?Objects from > a GC thread's local free lists are pushed to the globals lists > after the collection (as far as I recall). So there is some > churn in the free lists. > > Jon > > On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >> On 05/13/10 10:50, Matt Fowles wrote: >>> Jon~ >>> >>> This may sound naive, but how can fragmentation be an issue if the old >>> gen has never been collected? ?I would think we are still in the space >>> where we can just bump the old gen alloc pointer... >> >> Matt, The old gen allocator may fragment the space. Allocation is not >> exactly "bump a pointer". >> >> -- ramki >> >>> >>> Matt >>> >>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>> wrote: >>>> Matt, >>>> >>>> As Ramki indicated fragmentation might be an issue. ?As the >>>> fragmentation >>>> in the old generation increases, it takes longer to find space in >>>> the old >>>> generation >>>> into which to promote objects from the young generation. ?This is >>>> apparently >>>> not >>>> the problem that Wayne is having but you still might be hitting it. >>>> If you >>>> can >>>> connect jconsole to the VM and force a full GC, that would tell us >>>> if it's >>>> fragmentation. >>>> >>>> There might be a scaling issue with the UseParNewGC. ?If you can use >>>> -XX:-UseParNewGC (turning off the parallel young >>>> generation collection) with ?-XX:+UseConcMarkSweepGC the pauses >>>> will be longer but may be more stable. ?That's not the solution but >>>> just >>>> part >>>> of the investigation. >>>> >>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>> and if you don't see the growing young generation pause, that would >>>> indicate >>>> something specific about promotion into the CMS generation. >>>> >>>> UseParallelGC is different from UseParNewGC in a number of ways >>>> and if you try UseParallelGC and still see the growing young generation >>>> pauses, I'd suspect something special about your application. >>>> >>>> If you can run these experiments hopefully they will tell >>>> us where to look next. >>>> >>>> Jon >>>> >>>> >>>> On 05/12/10 15:19, Matt Fowles wrote: >>>> >>>> All~ >>>> >>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>> am trying to reduce the size of gc outliers. ?About 99% of this data >>>> is garbage, but almost anything that survives one collection survives >>>> for an indeterminately long amount of time. ?We are currently using >>>> the following VM and options: >>>> >>>> java version "1.6.0_20" >>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>> >>>> ? ? ? ? ? ? ? ?-verbose:gc >>>> ? ? ? ? ? ? ? ?-XX:+PrintGCTimeStamps >>>> ? ? ? ? ? ? ? ?-XX:+PrintGCDetails >>>> ? ? ? ? ? ? ? ?-XX:+PrintGCTaskTimeStamps >>>> ? ? ? ? ? ? ? ?-XX:+PrintTenuringDistribution >>>> ? ? ? ? ? ? ? ?-XX:+PrintCommandLineFlags >>>> ? ? ? ? ? ? ? ?-XX:+PrintReferenceGC >>>> ? ? ? ? ? ? ? ?-Xms32g -Xmx32g -Xmn4g >>>> ? ? ? ? ? ? ? ?-XX:+UseParNewGC >>>> ? ? ? ? ? ? ? ?-XX:ParallelGCThreads=4 >>>> ? ? ? ? ? ? ? ?-XX:+UseConcMarkSweepGC >>>> ? ? ? ? ? ? ? ?-XX:ParallelCMSThreads=4 >>>> ? ? ? ? ? ? ? ?-XX:CMSInitiatingOccupancyFraction=60 >>>> ? ? ? ? ? ? ? ?-XX:+UseCMSInitiatingOccupancyOnly >>>> ? ? ? ? ? ? ? ?-XX:+CMSParallelRemarkEnabled >>>> ? ? ? ? ? ? ? ?-XX:MaxGCPauseMillis=50 >>>> ? ? ? ? ? ? ? ?-Xloggc:gc.log >>>> >>>> >>>> As you can see from the GC log, we never actually reach the point >>>> where the CMS kicks in (after app startup). ?But our young gens seem >>>> to take increasingly long to collect as time goes by. >>>> >>>> The steady state of the app is reached around 956.392 into the log >>>> with a collection that takes 0.106 seconds. ?Thereafter the survivor >>>> space remains roughly constantly as filled and the amount promoted to >>>> old gen also remains constant, but the collection times increase to >>>> 2.855 seconds by the end of the 3.5 hour run. >>>> >>>> Has anyone seen this sort of behavior before? ?Are there more switches >>>> that I should try running with? >>>> >>>> Obviously, I am working to profile the app and reduce the garbage load >>>> in parallel. ?But if I still see this sort of problem, it is only a >>>> question of how long must the app run before I see unacceptable >>>> latency spikes. >>>> >>>> Matt >>>> >>>> ________________________________ >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From chkwok at digibites.nl Fri May 14 06:49:22 2010 From: chkwok at digibites.nl (Chi Ho Kwok) Date: Fri, 14 May 2010 15:49:22 +0200 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> Message-ID: Good read. Yeah, it's the same story with every caching app: least recently used cache = every object that's in it sticks around for [cache objects average lifetime], breaking the Object Generational Hypothesis. My solution: throw more hardware at it. Just lower the new generation size (-Xmn is a shortcut for MaxNewSize/NewSize) until the pauses are acceptable. The old gen gets collected by CMS all the time, removing expired cache items, but it doesn't introduce any pauses. The only cost is CPU time - which isn't really scarce in most memory cache apps. If you pick a limit high enough (1 pause per 2-3s), most temporary objects won't even make it to the old generation. Going the JNI route can be doable for some apps (block / page cache), but in my case we'd have to serialize / unserialize huge object graphs all the time. Chi Ho On Fri, May 14, 2010 at 1:54 AM, Ryan Rawson wrote: > Hi, > > I have had a similar experience and I ran into a reasonable and > unsatisfactory solution... > > In my case, the young GC times keeps on taking increasingly longer and > longer. Doing GC logging I saw something similar to what you saw - the > ParNew keeps on growing and the amount of data being tenured was huge. > Having a 800ms GC pause every 4 seconds was no good for me. I > eventually added this to my java start: > > "-XX:NewSize=64m -XX:MaxNewSize=64m" > > A friend suggested this to me - he said that a young GC should be fast > because the young gen should be ~ the size of L3 cache. > > With this setting I see YoungGCs between .5-3 times a second and they > last between 10-80ms or so. The CMS will prune massive amounts of > garbage out when it runs, up to 2GB ram in my 6GB heap processes. > > > Now for a little theorycrafting... The root cause here is my > application breaks the Object Generational Hypothesis. The GC > auto-tuning will grow the ParNew to reduce the amount of data it is > tenuring, but it is never really able to reach a good steady state. > At this point you are now tenuring 1-2GB of ram. Tenuring = copying > objects = time consuming. > > Once you are at this spot, you find out that every current shipping GC > is just not good enough. My hope was to use G1, but considering how > unstable it was for me (I have tried 12+ releases of Java7, a few > releases of Java6) I am now shifting my approach. > > In my application one of the primary causes of allocation is a block > cache for a database-type application. I am planning on testing a > change where the block cache is maintained in massive > DirectByteBuffers (think sizes from 2-15GB of RAM) and I will manage > all the allocation by hand (in Java). If you have some ability to > shift memory usage out of the domain of the GC I would highly suggest > doing so. > > At this point I can honestly say if you are not Object Generational > Hypothesis Compliant (OGH (tm)) then Java for large heaps can be very > very painful. I think the choices are DirectByteBuffer, JNI, and not > using Java. I'd like an option to that, but I'm not sure what it > might be while avoiding that last option (also avoiding JNI too > ideally). > > I feel this is the greatest weakness of Java - the memory management > is 1 size fits all, and there are few great options. > DirectByteBuffers have a limited interface and require copying data in > and out to talk to the rest of Java. JNI has the same issue and has > had a high invokation cost. > > Good luck out there, and stay OGH compliant! > -ryan > > On Thu, May 13, 2010 at 3:29 PM, Jon Masamitsu > wrote: > > Matt, > > > > To amplify on Ramki's comment, the allocations out of the > > old generation are always from a free list. During a young > > generation collection each GC thread will get its own > > local free lists from the old generation so that it can > > copy objects to the old generation without synchronizing > > with the other GC thread (most of the time). Objects from > > a GC thread's local free lists are pushed to the globals lists > > after the collection (as far as I recall). So there is some > > churn in the free lists. > > > > Jon > > > > On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: > >> On 05/13/10 10:50, Matt Fowles wrote: > >>> Jon~ > >>> > >>> This may sound naive, but how can fragmentation be an issue if the old > >>> gen has never been collected? I would think we are still in the space > >>> where we can just bump the old gen alloc pointer... > >> > >> Matt, The old gen allocator may fragment the space. Allocation is not > >> exactly "bump a pointer". > >> > >> -- ramki > >> > >>> > >>> Matt > >>> > >>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu > >>> wrote: > >>>> Matt, > >>>> > >>>> As Ramki indicated fragmentation might be an issue. As the > >>>> fragmentation > >>>> in the old generation increases, it takes longer to find space in > >>>> the old > >>>> generation > >>>> into which to promote objects from the young generation. This is > >>>> apparently > >>>> not > >>>> the problem that Wayne is having but you still might be hitting it. > >>>> If you > >>>> can > >>>> connect jconsole to the VM and force a full GC, that would tell us > >>>> if it's > >>>> fragmentation. > >>>> > >>>> There might be a scaling issue with the UseParNewGC. If you can use > >>>> -XX:-UseParNewGC (turning off the parallel young > >>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses > >>>> will be longer but may be more stable. That's not the solution but > >>>> just > >>>> part > >>>> of the investigation. > >>>> > >>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC > >>>> and if you don't see the growing young generation pause, that would > >>>> indicate > >>>> something specific about promotion into the CMS generation. > >>>> > >>>> UseParallelGC is different from UseParNewGC in a number of ways > >>>> and if you try UseParallelGC and still see the growing young > generation > >>>> pauses, I'd suspect something special about your application. > >>>> > >>>> If you can run these experiments hopefully they will tell > >>>> us where to look next. > >>>> > >>>> Jon > >>>> > >>>> > >>>> On 05/12/10 15:19, Matt Fowles wrote: > >>>> > >>>> All~ > >>>> > >>>> I have a large app that produces ~4g of garbage every 30 seconds and > >>>> am trying to reduce the size of gc outliers. About 99% of this data > >>>> is garbage, but almost anything that survives one collection survives > >>>> for an indeterminately long amount of time. We are currently using > >>>> the following VM and options: > >>>> > >>>> java version "1.6.0_20" > >>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > >>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > >>>> > >>>> -verbose:gc > >>>> -XX:+PrintGCTimeStamps > >>>> -XX:+PrintGCDetails > >>>> -XX:+PrintGCTaskTimeStamps > >>>> -XX:+PrintTenuringDistribution > >>>> -XX:+PrintCommandLineFlags > >>>> -XX:+PrintReferenceGC > >>>> -Xms32g -Xmx32g -Xmn4g > >>>> -XX:+UseParNewGC > >>>> -XX:ParallelGCThreads=4 > >>>> -XX:+UseConcMarkSweepGC > >>>> -XX:ParallelCMSThreads=4 > >>>> -XX:CMSInitiatingOccupancyFraction=60 > >>>> -XX:+UseCMSInitiatingOccupancyOnly > >>>> -XX:+CMSParallelRemarkEnabled > >>>> -XX:MaxGCPauseMillis=50 > >>>> -Xloggc:gc.log > >>>> > >>>> > >>>> As you can see from the GC log, we never actually reach the point > >>>> where the CMS kicks in (after app startup). But our young gens seem > >>>> to take increasingly long to collect as time goes by. > >>>> > >>>> The steady state of the app is reached around 956.392 into the log > >>>> with a collection that takes 0.106 seconds. Thereafter the survivor > >>>> space remains roughly constantly as filled and the amount promoted to > >>>> old gen also remains constant, but the collection times increase to > >>>> 2.855 seconds by the end of the 3.5 hour run. > >>>> > >>>> Has anyone seen this sort of behavior before? Are there more switches > >>>> that I should try running with? > >>>> > >>>> Obviously, I am working to profile the app and reduce the garbage load > >>>> in parallel. But if I still see this sort of problem, it is only a > >>>> question of how long must the app run before I see unacceptable > >>>> latency spikes. > >>>> > >>>> Matt > >>>> > >>>> ________________________________ > >>>> _______________________________________________ > >>>> hotspot-gc-use mailing list > >>>> hotspot-gc-use at openjdk.java.net > >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >>> _______________________________________________ > >>> hotspot-gc-use mailing list > >>> hotspot-gc-use at openjdk.java.net > >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/c13c6dd9/attachment-0001.html From y.s.ramakrishna at oracle.com Fri May 14 09:58:09 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 14 May 2010 09:58:09 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: Message-ID: <4BED8121.5000405@oracle.com> Hi Matt -- i am computing some metrics from yr log file and would like to know how many cpu's you have for the logs below? Also, as you noted, almost anything that survives a scavenge lives for a while. To reduce the overhead of unnecessary back-and-forth copying in the survivor spaces, just use MaxTenuringThreshold=1 (This suggestion was also made by several others in the thread, and is corroborated by your PrintTenuringDistribution data). Since you have farily large survivor spaces configured now, (at least large enough to fit 4 age cohorts, which will be down to 1 age cohort if you use MTT=1), i'd suggest making your surviror spaces smaller, may be down to about 64 MB from the current 420 MB each, and give the excess to your Eden space. Then use 6u21 when it comes out (or ask your Java support to send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into your 6u20 installation, and run with that. If you still see rising pause times let me know or file a bug, and send us the log file and JVM options along with full platform information. I'll run some metrics from yr log file if you send me the info re platform above, and that may perhaps reveal a few more secrets. later. -- ramki On 05/12/10 15:19, Matt Fowles wrote: > All~ > > I have a large app that produces ~4g of garbage every 30 seconds and > am trying to reduce the size of gc outliers. About 99% of this data > is garbage, but almost anything that survives one collection survives > for an indeterminately long amount of time. We are currently using > the following VM and options: > > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintCommandLineFlags > -XX:+PrintReferenceGC > -Xms32g -Xmx32g -Xmn4g > -XX:+UseParNewGC > -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=4 > -XX:CMSInitiatingOccupancyFraction=60 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSParallelRemarkEnabled > -XX:MaxGCPauseMillis=50 > -Xloggc:gc.log > > > As you can see from the GC log, we never actually reach the point > where the CMS kicks in (after app startup). But our young gens seem > to take increasingly long to collect as time goes by. > > The steady state of the app is reached around 956.392 into the log > with a collection that takes 0.106 seconds. Thereafter the survivor > space remains roughly constantly as filled and the amount promoted to > old gen also remains constant, but the collection times increase to > 2.855 seconds by the end of the 3.5 hour run. > > Has anyone seen this sort of behavior before? Are there more switches > that I should try running with? > > Obviously, I am working to profile the app and reduce the garbage load > in parallel. But if I still see this sort of problem, it is only a > question of how long must the app run before I see unacceptable > latency spikes. > > Matt > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From matt.fowles at gmail.com Fri May 14 10:07:59 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 13:07:59 -0400 Subject: Growing GC Young Gen Times In-Reply-To: <4BED8121.5000405@oracle.com> References: <4BED8121.5000405@oracle.com> Message-ID: Ramki~ The machine has 4 cpus each of which have 4 cores. I will adjust the survivor spaces as you suggest. Previously I had been running with MTT 0, but change it to 4 at the suggestion of others. Running with the JDK7 version may take a bit of time, but I will pursue that as well. Matt On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna wrote: > Hi Matt -- i am computing some metrics from yr log file > and would like to know how many cpu's you have for the logs below? > > Also, as you noted, almost anything that survives a scavenge > lives for a while. To reduce the overhead of unnecessary > back-and-forth copying in the survivor spaces, just use > MaxTenuringThreshold=1 (This suggestion was also made by > several others in the thread, and is corroborated by your > PrintTenuringDistribution data). Since you have farily large survivor > spaces configured now, (at least large enough to fit 4 age cohorts, > which will be down to 1 age cohort if you use MTT=1), i'd > suggest making your surviror spaces smaller, may be down to > about 64 MB from the current 420 MB each, and give the excess > to your Eden space. > > Then use 6u21 when it comes out (or ask your Java support to > send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into > your 6u20 installation, and run with that. If you still see > rising pause times let me know or file a bug, and send us the > log file and JVM options along with full platform information. > > I'll run some metrics from yr log file if you send me the info > re platform above, and that may perhaps reveal a few more secrets. > > later. > -- ramki > > On 05/12/10 15:19, Matt Fowles wrote: >> >> All~ >> >> I have a large app that produces ~4g of garbage every 30 seconds and >> am trying to reduce the size of gc outliers. ?About 99% of this data >> is garbage, but almost anything that survives one collection survives >> for an indeterminately long amount of time. ?We are currently using >> the following VM and options: >> >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> ? ? ? ? ? ? ? -verbose:gc >> ? ? ? ? ? ? ? -XX:+PrintGCTimeStamps >> ? ? ? ? ? ? ? -XX:+PrintGCDetails >> ? ? ? ? ? ? ? -XX:+PrintGCTaskTimeStamps >> ? ? ? ? ? ? ? -XX:+PrintTenuringDistribution >> ? ? ? ? ? ? ? -XX:+PrintCommandLineFlags >> ? ? ? ? ? ? ? -XX:+PrintReferenceGC >> ? ? ? ? ? ? ? -Xms32g -Xmx32g -Xmn4g >> ? ? ? ? ? ? ? -XX:+UseParNewGC >> ? ? ? ? ? ? ? -XX:ParallelGCThreads=4 >> ? ? ? ? ? ? ? -XX:+UseConcMarkSweepGC >> ? ? ? ? ? ? ? -XX:ParallelCMSThreads=4 >> ? ? ? ? ? ? ? -XX:CMSInitiatingOccupancyFraction=60 >> ? ? ? ? ? ? ? -XX:+UseCMSInitiatingOccupancyOnly >> ? ? ? ? ? ? ? -XX:+CMSParallelRemarkEnabled >> ? ? ? ? ? ? ? -XX:MaxGCPauseMillis=50 >> ? ? ? ? ? ? ? -Xloggc:gc.log >> >> >> As you can see from the GC log, we never actually reach the point >> where the CMS kicks in (after app startup). ?But our young gens seem >> to take increasingly long to collect as time goes by. >> >> The steady state of the app is reached around 956.392 into the log >> with a collection that takes 0.106 seconds. ?Thereafter the survivor >> space remains roughly constantly as filled and the amount promoted to >> old gen also remains constant, but the collection times increase to >> 2.855 seconds by the end of the 3.5 hour run. >> >> Has anyone seen this sort of behavior before? ?Are there more switches >> that I should try running with? >> >> Obviously, I am working to profile the app and reduce the garbage load >> in parallel. ?But if I still see this sort of problem, it is only a >> question of how long must the app run before I see unacceptable >> latency spikes. >> >> Matt >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From y.s.ramakrishna at oracle.com Fri May 14 10:23:38 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 14 May 2010 10:23:38 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BED8121.5000405@oracle.com> Message-ID: <4BED871A.4010306@oracle.com> On 05/14/10 10:07, Matt Fowles wrote: > Ramki~ > > The machine has 4 cpus each of which have 4 cores. I will adjust the Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with -XX:-UseParNewGC. if it's the kind of fragmentation that we believe may be the cause here, you'd see larger gc times in the latter case but they would not increase as they do now. But that is conjecture at this point. > survivor spaces as you suggest. Previously I had been running with > MTT 0, but change it to 4 at the suggestion of others. MTT=0 can give very poor performance, as people said MTT=4 would definitely be better here than MTT=0. You should use MTT=1 here though. > > Running with the JDK7 version may take a bit of time, but I will > pursue that as well. All you should do is pull the libjvm.so that is in the JDK 7 installation (or bundle) and plonk it down into the appropriate directory of your existing JDK 6u20 installation. We just want to see the results with the latest JVM which includes a fix for 6631166. I attached a very rough plot of some metrics extracted from your log and this behaviour is definitely deserving of a bug, especially if it can be shown that it happens in the latest JVM. In the plot: red: scavenge durations dark blue: promoted data per scavenge pink: data in survivor space following scavenge light blue: live data in old gen As you can see the scavenge clearly correlates with the occupancy of the old gen (as Jon and others indicated). Did you try Jon's suggestion of doing a manual GC at that point via jconsole, and seeing if the upward trend of scavenges continues beyond that? Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch? Do you have an easily used test case that you can share with us via your support channels? If/when you do so, please copy me and send them a reference to this thread on this mailing list. later, with your new data. -- ramki > > Matt > > > > On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna > wrote: >> Hi Matt -- i am computing some metrics from yr log file >> and would like to know how many cpu's you have for the logs below? >> >> Also, as you noted, almost anything that survives a scavenge >> lives for a while. To reduce the overhead of unnecessary >> back-and-forth copying in the survivor spaces, just use >> MaxTenuringThreshold=1 (This suggestion was also made by >> several others in the thread, and is corroborated by your >> PrintTenuringDistribution data). Since you have farily large survivor >> spaces configured now, (at least large enough to fit 4 age cohorts, >> which will be down to 1 age cohort if you use MTT=1), i'd >> suggest making your surviror spaces smaller, may be down to >> about 64 MB from the current 420 MB each, and give the excess >> to your Eden space. >> >> Then use 6u21 when it comes out (or ask your Java support to >> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into >> your 6u20 installation, and run with that. If you still see >> rising pause times let me know or file a bug, and send us the >> log file and JVM options along with full platform information. >> >> I'll run some metrics from yr log file if you send me the info >> re platform above, and that may perhaps reveal a few more secrets. >> >> later. >> -- ramki >> >> On 05/12/10 15:19, Matt Fowles wrote: >>> All~ >>> >>> I have a large app that produces ~4g of garbage every 30 seconds and >>> am trying to reduce the size of gc outliers. About 99% of this data >>> is garbage, but almost anything that survives one collection survives >>> for an indeterminately long amount of time. We are currently using >>> the following VM and options: >>> >>> java version "1.6.0_20" >>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>> >>> -verbose:gc >>> -XX:+PrintGCTimeStamps >>> -XX:+PrintGCDetails >>> -XX:+PrintGCTaskTimeStamps >>> -XX:+PrintTenuringDistribution >>> -XX:+PrintCommandLineFlags >>> -XX:+PrintReferenceGC >>> -Xms32g -Xmx32g -Xmn4g >>> -XX:+UseParNewGC >>> -XX:ParallelGCThreads=4 >>> -XX:+UseConcMarkSweepGC >>> -XX:ParallelCMSThreads=4 >>> -XX:CMSInitiatingOccupancyFraction=60 >>> -XX:+UseCMSInitiatingOccupancyOnly >>> -XX:+CMSParallelRemarkEnabled >>> -XX:MaxGCPauseMillis=50 >>> -Xloggc:gc.log >>> >>> >>> As you can see from the GC log, we never actually reach the point >>> where the CMS kicks in (after app startup). But our young gens seem >>> to take increasingly long to collect as time goes by. >>> >>> The steady state of the app is reached around 956.392 into the log >>> with a collection that takes 0.106 seconds. Thereafter the survivor >>> space remains roughly constantly as filled and the amount promoted to >>> old gen also remains constant, but the collection times increase to >>> 2.855 seconds by the end of the 3.5 hour run. >>> >>> Has anyone seen this sort of behavior before? Are there more switches >>> that I should try running with? >>> >>> Obviously, I am working to profile the app and reduce the garbage load >>> in parallel. But if I still see this sort of problem, it is only a >>> question of how long must the app run before I see unacceptable >>> latency spikes. >>> >>> Matt >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> -------------- next part -------------- A non-text attachment was scrubbed... Name: rough_plot.gif Type: image/gif Size: 16547 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/acdad9eb/attachment-0001.gif From matt.fowles at gmail.com Fri May 14 10:24:07 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 13:24:07 -0400 Subject: Growing GC Young Gen Times In-Reply-To: <4BEC7D4D.2000905@oracle.com> References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> Message-ID: Jon~ That makes, sense but the fact is that the old gen *never* get collected. So all the allocations happen from the giant empty space at the end of the free list. I thought fragmentation only occurred when the free lists are added to after freeing memory... Matt On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu wrote: > Matt, > > To amplify on Ramki's comment, the allocations out of the > old generation are always from a free list. ?During a young > generation collection each GC thread will get its own > local free lists from the old generation so that it can > copy objects to the old generation without synchronizing > with the other GC thread (most of the time). ?Objects from > a GC thread's local free lists are pushed to the globals lists > after the collection (as far as I recall). So there is some > churn in the free lists. > > Jon > > On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >> >> On 05/13/10 10:50, Matt Fowles wrote: >>> >>> Jon~ >>> >>> This may sound naive, but how can fragmentation be an issue if the old >>> gen has never been collected? ?I would think we are still in the space >>> where we can just bump the old gen alloc pointer... >> >> Matt, The old gen allocator may fragment the space. Allocation is not >> exactly "bump a pointer". >> >> -- ramki >> >>> >>> Matt >>> >>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>> wrote: >>>> >>>> Matt, >>>> >>>> As Ramki indicated fragmentation might be an issue. ?As the >>>> fragmentation >>>> in the old generation increases, it takes longer to find space in the >>>> old >>>> generation >>>> into which to promote objects from the young generation. ?This is >>>> apparently >>>> not >>>> the problem that Wayne is having but you still might be hitting it. ?If >>>> you >>>> can >>>> connect jconsole to the VM and force a full GC, that would tell us if >>>> it's >>>> fragmentation. >>>> >>>> There might be a scaling issue with the UseParNewGC. ?If you can use >>>> -XX:-UseParNewGC (turning off the parallel young >>>> generation collection) with ?-XX:+UseConcMarkSweepGC the pauses >>>> will be longer but may be more stable. ?That's not the solution but just >>>> part >>>> of the investigation. >>>> >>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>> and if you don't see the growing young generation pause, that would >>>> indicate >>>> something specific about promotion into the CMS generation. >>>> >>>> UseParallelGC is different from UseParNewGC in a number of ways >>>> and if you try UseParallelGC and still see the growing young generation >>>> pauses, I'd suspect something special about your application. >>>> >>>> If you can run these experiments hopefully they will tell >>>> us where to look next. >>>> >>>> Jon >>>> >>>> >>>> On 05/12/10 15:19, Matt Fowles wrote: >>>> >>>> All~ >>>> >>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>> am trying to reduce the size of gc outliers. ?About 99% of this data >>>> is garbage, but almost anything that survives one collection survives >>>> for an indeterminately long amount of time. ?We are currently using >>>> the following VM and options: >>>> >>>> java version "1.6.0_20" >>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>> >>>> ? ? ? ? ? ? ? -verbose:gc >>>> ? ? ? ? ? ? ? -XX:+PrintGCTimeStamps >>>> ? ? ? ? ? ? ? -XX:+PrintGCDetails >>>> ? ? ? ? ? ? ? -XX:+PrintGCTaskTimeStamps >>>> ? ? ? ? ? ? ? -XX:+PrintTenuringDistribution >>>> ? ? ? ? ? ? ? -XX:+PrintCommandLineFlags >>>> ? ? ? ? ? ? ? -XX:+PrintReferenceGC >>>> ? ? ? ? ? ? ? -Xms32g -Xmx32g -Xmn4g >>>> ? ? ? ? ? ? ? -XX:+UseParNewGC >>>> ? ? ? ? ? ? ? -XX:ParallelGCThreads=4 >>>> ? ? ? ? ? ? ? -XX:+UseConcMarkSweepGC >>>> ? ? ? ? ? ? ? -XX:ParallelCMSThreads=4 >>>> ? ? ? ? ? ? ? -XX:CMSInitiatingOccupancyFraction=60 >>>> ? ? ? ? ? ? ? -XX:+UseCMSInitiatingOccupancyOnly >>>> ? ? ? ? ? ? ? -XX:+CMSParallelRemarkEnabled >>>> ? ? ? ? ? ? ? -XX:MaxGCPauseMillis=50 >>>> ? ? ? ? ? ? ? -Xloggc:gc.log >>>> >>>> >>>> As you can see from the GC log, we never actually reach the point >>>> where the CMS kicks in (after app startup). ?But our young gens seem >>>> to take increasingly long to collect as time goes by. >>>> >>>> The steady state of the app is reached around 956.392 into the log >>>> with a collection that takes 0.106 seconds. ?Thereafter the survivor >>>> space remains roughly constantly as filled and the amount promoted to >>>> old gen also remains constant, but the collection times increase to >>>> 2.855 seconds by the end of the 3.5 hour run. >>>> >>>> Has anyone seen this sort of behavior before? ?Are there more switches >>>> that I should try running with? >>>> >>>> Obviously, I am working to profile the app and reduce the garbage load >>>> in parallel. ?But if I still see this sort of problem, it is only a >>>> question of how long must the app run before I see unacceptable >>>> latency spikes. >>>> >>>> Matt >>>> >>>> ________________________________ >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > From y.s.ramakrishna at oracle.com Fri May 14 10:36:23 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 14 May 2010 10:36:23 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> Message-ID: <4BED8A17.9090208@oracle.com> On 05/14/10 10:24, Matt Fowles wrote: > Jon~ > > That makes, sense but the fact is that the old gen *never* get > collected. So all the allocations happen from the giant empty space > at the end of the free list. I thought fragmentation only occurred > when the free lists are added to after freeing memory... As Jon indicated allocation is done from free lists of blocks that are pre-carved on demand to avoid contention while allocating. The old heuristics for how large to make those lists and the inventory to hold in those lists was not working well as you scaled the number of workers. Following 6631166 we believe it works better and causes both less contention and less fragmentation than it did before, because we do not hold unnecessary excess inventory of free blocks. The fragmentation in turn causes card-scanning to suffer adversely, besides the issues with loss of spatial locality also increasing cache misses and TLB misses. (The large page option might help mitigate the latter a bit, especially since you have such a large heap and our fragmented allocation may be exacerbating the TLB pressure.) -- ramki > > Matt > > On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu wrote: >> Matt, >> >> To amplify on Ramki's comment, the allocations out of the >> old generation are always from a free list. During a young >> generation collection each GC thread will get its own >> local free lists from the old generation so that it can >> copy objects to the old generation without synchronizing >> with the other GC thread (most of the time). Objects from >> a GC thread's local free lists are pushed to the globals lists >> after the collection (as far as I recall). So there is some >> churn in the free lists. >> >> Jon >> >> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >>> On 05/13/10 10:50, Matt Fowles wrote: >>>> Jon~ >>>> >>>> This may sound naive, but how can fragmentation be an issue if the old >>>> gen has never been collected? I would think we are still in the space >>>> where we can just bump the old gen alloc pointer... >>> Matt, The old gen allocator may fragment the space. Allocation is not >>> exactly "bump a pointer". >>> >>> -- ramki >>> >>>> Matt >>>> >>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>>> wrote: >>>>> Matt, >>>>> >>>>> As Ramki indicated fragmentation might be an issue. As the >>>>> fragmentation >>>>> in the old generation increases, it takes longer to find space in the >>>>> old >>>>> generation >>>>> into which to promote objects from the young generation. This is >>>>> apparently >>>>> not >>>>> the problem that Wayne is having but you still might be hitting it. If >>>>> you >>>>> can >>>>> connect jconsole to the VM and force a full GC, that would tell us if >>>>> it's >>>>> fragmentation. >>>>> >>>>> There might be a scaling issue with the UseParNewGC. If you can use >>>>> -XX:-UseParNewGC (turning off the parallel young >>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses >>>>> will be longer but may be more stable. That's not the solution but just >>>>> part >>>>> of the investigation. >>>>> >>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>>> and if you don't see the growing young generation pause, that would >>>>> indicate >>>>> something specific about promotion into the CMS generation. >>>>> >>>>> UseParallelGC is different from UseParNewGC in a number of ways >>>>> and if you try UseParallelGC and still see the growing young generation >>>>> pauses, I'd suspect something special about your application. >>>>> >>>>> If you can run these experiments hopefully they will tell >>>>> us where to look next. >>>>> >>>>> Jon >>>>> >>>>> >>>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>> >>>>> All~ >>>>> >>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>> am trying to reduce the size of gc outliers. About 99% of this data >>>>> is garbage, but almost anything that survives one collection survives >>>>> for an indeterminately long amount of time. We are currently using >>>>> the following VM and options: >>>>> >>>>> java version "1.6.0_20" >>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>> >>>>> -verbose:gc >>>>> -XX:+PrintGCTimeStamps >>>>> -XX:+PrintGCDetails >>>>> -XX:+PrintGCTaskTimeStamps >>>>> -XX:+PrintTenuringDistribution >>>>> -XX:+PrintCommandLineFlags >>>>> -XX:+PrintReferenceGC >>>>> -Xms32g -Xmx32g -Xmn4g >>>>> -XX:+UseParNewGC >>>>> -XX:ParallelGCThreads=4 >>>>> -XX:+UseConcMarkSweepGC >>>>> -XX:ParallelCMSThreads=4 >>>>> -XX:CMSInitiatingOccupancyFraction=60 >>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>> -XX:+CMSParallelRemarkEnabled >>>>> -XX:MaxGCPauseMillis=50 >>>>> -Xloggc:gc.log >>>>> >>>>> >>>>> As you can see from the GC log, we never actually reach the point >>>>> where the CMS kicks in (after app startup). But our young gens seem >>>>> to take increasingly long to collect as time goes by. >>>>> >>>>> The steady state of the app is reached around 956.392 into the log >>>>> with a collection that takes 0.106 seconds. Thereafter the survivor >>>>> space remains roughly constantly as filled and the amount promoted to >>>>> old gen also remains constant, but the collection times increase to >>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>> >>>>> Has anyone seen this sort of behavior before? Are there more switches >>>>> that I should try running with? >>>>> >>>>> Obviously, I am working to profile the app and reduce the garbage load >>>>> in parallel. But if I still see this sort of problem, it is only a >>>>> question of how long must the app run before I see unacceptable >>>>> latency spikes. >>>>> >>>>> Matt >>>>> >>>>> ________________________________ >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Fri May 14 10:39:50 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 14 May 2010 10:39:50 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> Message-ID: <4BED8AE6.1020408@oracle.com> On 5/14/10 10:24 AM, Matt Fowles wrote: > Jon~ > > That makes, sense but the fact is that the old gen *never* get > collected. So all the allocations happen from the giant empty space > at the end of the free list. I thought fragmentation only occurred > when the free lists are added to after freeing memory... > Ok. You may be right. > Matt > > On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu wrote: > >> Matt, >> >> To amplify on Ramki's comment, the allocations out of the >> old generation are always from a free list. During a young >> generation collection each GC thread will get its own >> local free lists from the old generation so that it can >> copy objects to the old generation without synchronizing >> with the other GC thread (most of the time). Objects from >> a GC thread's local free lists are pushed to the globals lists >> after the collection (as far as I recall). So there is some >> churn in the free lists. >> >> Jon >> >> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >> >>> On 05/13/10 10:50, Matt Fowles wrote: >>> >>>> Jon~ >>>> >>>> This may sound naive, but how can fragmentation be an issue if the old >>>> gen has never been collected? I would think we are still in the space >>>> where we can just bump the old gen alloc pointer... >>>> >>> Matt, The old gen allocator may fragment the space. Allocation is not >>> exactly "bump a pointer". >>> >>> -- ramki >>> >>> >>>> Matt >>>> >>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>>> wrote: >>>> >>>>> Matt, >>>>> >>>>> As Ramki indicated fragmentation might be an issue. As the >>>>> fragmentation >>>>> in the old generation increases, it takes longer to find space in the >>>>> old >>>>> generation >>>>> into which to promote objects from the young generation. This is >>>>> apparently >>>>> not >>>>> the problem that Wayne is having but you still might be hitting it. If >>>>> you >>>>> can >>>>> connect jconsole to the VM and force a full GC, that would tell us if >>>>> it's >>>>> fragmentation. >>>>> >>>>> There might be a scaling issue with the UseParNewGC. If you can use >>>>> -XX:-UseParNewGC (turning off the parallel young >>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses >>>>> will be longer but may be more stable. That's not the solution but just >>>>> part >>>>> of the investigation. >>>>> >>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>>> and if you don't see the growing young generation pause, that would >>>>> indicate >>>>> something specific about promotion into the CMS generation. >>>>> >>>>> UseParallelGC is different from UseParNewGC in a number of ways >>>>> and if you try UseParallelGC and still see the growing young generation >>>>> pauses, I'd suspect something special about your application. >>>>> >>>>> If you can run these experiments hopefully they will tell >>>>> us where to look next. >>>>> >>>>> Jon >>>>> >>>>> >>>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>> >>>>> All~ >>>>> >>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>> am trying to reduce the size of gc outliers. About 99% of this data >>>>> is garbage, but almost anything that survives one collection survives >>>>> for an indeterminately long amount of time. We are currently using >>>>> the following VM and options: >>>>> >>>>> java version "1.6.0_20" >>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>> >>>>> -verbose:gc >>>>> -XX:+PrintGCTimeStamps >>>>> -XX:+PrintGCDetails >>>>> -XX:+PrintGCTaskTimeStamps >>>>> -XX:+PrintTenuringDistribution >>>>> -XX:+PrintCommandLineFlags >>>>> -XX:+PrintReferenceGC >>>>> -Xms32g -Xmx32g -Xmn4g >>>>> -XX:+UseParNewGC >>>>> -XX:ParallelGCThreads=4 >>>>> -XX:+UseConcMarkSweepGC >>>>> -XX:ParallelCMSThreads=4 >>>>> -XX:CMSInitiatingOccupancyFraction=60 >>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>> -XX:+CMSParallelRemarkEnabled >>>>> -XX:MaxGCPauseMillis=50 >>>>> -Xloggc:gc.log >>>>> >>>>> >>>>> As you can see from the GC log, we never actually reach the point >>>>> where the CMS kicks in (after app startup). But our young gens seem >>>>> to take increasingly long to collect as time goes by. >>>>> >>>>> The steady state of the app is reached around 956.392 into the log >>>>> with a collection that takes 0.106 seconds. Thereafter the survivor >>>>> space remains roughly constantly as filled and the amount promoted to >>>>> old gen also remains constant, but the collection times increase to >>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>> >>>>> Has anyone seen this sort of behavior before? Are there more switches >>>>> that I should try running with? >>>>> >>>>> Obviously, I am working to profile the app and reduce the garbage load >>>>> in parallel. But if I still see this sort of problem, it is only a >>>>> question of how long must the app run before I see unacceptable >>>>> latency spikes. >>>>> >>>>> Matt >>>>> >>>>> ________________________________ >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>> >> From y.s.ramakrishna at oracle.com Fri May 14 10:44:25 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 14 May 2010 10:44:25 -0700 Subject: Growing GC Young Gen Times In-Reply-To: <4BED8A17.9090208@oracle.com> References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> <4BED8A17.9090208@oracle.com> Message-ID: <4BED8BF9.7000803@oracle.com> On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote: > On 05/14/10 10:24, Matt Fowles wrote: >> Jon~ >> >> That makes, sense but the fact is that the old gen *never* get >> collected. So all the allocations happen from the giant empty space >> at the end of the free list. I thought fragmentation only occurred >> when the free lists are added to after freeing memory... > > As Jon indicated allocation is done from free lists of blocks > that are pre-carved on demand to avoid contention while allocating. > The old heuristics for how large to make those lists and the > inventory to hold in those lists was not working well as you > scaled the number of workers. Following 6631166 we believe it > works better and causes both less contention and less > fragmentation than it did before, because we do not hold > unnecessary excess inventory of free blocks. To see what the fragmentation is, try -XX:PrintFLSStatistics=2. This will slow down your scavenge pauses (perhaps by quite a bit for your 26 GB heap), but you will get a report of the number of blocks on free lists and how fragmented the space is on that ccount (for some appropriate notion of fragmentation). Don't use that flag in production though :-) -- ramki > > The fragmentation in turn causes card-scanning to suffer > adversely, besides the issues with loss of spatial locality also > increasing cache misses and TLB misses. (The large page > option might help mitigate the latter a bit, especially > since you have such a large heap and our fragmented > allocation may be exacerbating the TLB pressure.) > > -- ramki > >> Matt >> >> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu wrote: >>> Matt, >>> >>> To amplify on Ramki's comment, the allocations out of the >>> old generation are always from a free list. During a young >>> generation collection each GC thread will get its own >>> local free lists from the old generation so that it can >>> copy objects to the old generation without synchronizing >>> with the other GC thread (most of the time). Objects from >>> a GC thread's local free lists are pushed to the globals lists >>> after the collection (as far as I recall). So there is some >>> churn in the free lists. >>> >>> Jon >>> >>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >>>> On 05/13/10 10:50, Matt Fowles wrote: >>>>> Jon~ >>>>> >>>>> This may sound naive, but how can fragmentation be an issue if the old >>>>> gen has never been collected? I would think we are still in the space >>>>> where we can just bump the old gen alloc pointer... >>>> Matt, The old gen allocator may fragment the space. Allocation is not >>>> exactly "bump a pointer". >>>> >>>> -- ramki >>>> >>>>> Matt >>>>> >>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>>>> wrote: >>>>>> Matt, >>>>>> >>>>>> As Ramki indicated fragmentation might be an issue. As the >>>>>> fragmentation >>>>>> in the old generation increases, it takes longer to find space in the >>>>>> old >>>>>> generation >>>>>> into which to promote objects from the young generation. This is >>>>>> apparently >>>>>> not >>>>>> the problem that Wayne is having but you still might be hitting it. If >>>>>> you >>>>>> can >>>>>> connect jconsole to the VM and force a full GC, that would tell us if >>>>>> it's >>>>>> fragmentation. >>>>>> >>>>>> There might be a scaling issue with the UseParNewGC. If you can use >>>>>> -XX:-UseParNewGC (turning off the parallel young >>>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses >>>>>> will be longer but may be more stable. That's not the solution but just >>>>>> part >>>>>> of the investigation. >>>>>> >>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>>>> and if you don't see the growing young generation pause, that would >>>>>> indicate >>>>>> something specific about promotion into the CMS generation. >>>>>> >>>>>> UseParallelGC is different from UseParNewGC in a number of ways >>>>>> and if you try UseParallelGC and still see the growing young generation >>>>>> pauses, I'd suspect something special about your application. >>>>>> >>>>>> If you can run these experiments hopefully they will tell >>>>>> us where to look next. >>>>>> >>>>>> Jon >>>>>> >>>>>> >>>>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>>> >>>>>> All~ >>>>>> >>>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>>> am trying to reduce the size of gc outliers. About 99% of this data >>>>>> is garbage, but almost anything that survives one collection survives >>>>>> for an indeterminately long amount of time. We are currently using >>>>>> the following VM and options: >>>>>> >>>>>> java version "1.6.0_20" >>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>>> >>>>>> -verbose:gc >>>>>> -XX:+PrintGCTimeStamps >>>>>> -XX:+PrintGCDetails >>>>>> -XX:+PrintGCTaskTimeStamps >>>>>> -XX:+PrintTenuringDistribution >>>>>> -XX:+PrintCommandLineFlags >>>>>> -XX:+PrintReferenceGC >>>>>> -Xms32g -Xmx32g -Xmn4g >>>>>> -XX:+UseParNewGC >>>>>> -XX:ParallelGCThreads=4 >>>>>> -XX:+UseConcMarkSweepGC >>>>>> -XX:ParallelCMSThreads=4 >>>>>> -XX:CMSInitiatingOccupancyFraction=60 >>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>> -XX:+CMSParallelRemarkEnabled >>>>>> -XX:MaxGCPauseMillis=50 >>>>>> -Xloggc:gc.log >>>>>> >>>>>> >>>>>> As you can see from the GC log, we never actually reach the point >>>>>> where the CMS kicks in (after app startup). But our young gens seem >>>>>> to take increasingly long to collect as time goes by. >>>>>> >>>>>> The steady state of the app is reached around 956.392 into the log >>>>>> with a collection that takes 0.106 seconds. Thereafter the survivor >>>>>> space remains roughly constantly as filled and the amount promoted to >>>>>> old gen also remains constant, but the collection times increase to >>>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>>> >>>>>> Has anyone seen this sort of behavior before? Are there more switches >>>>>> that I should try running with? >>>>>> >>>>>> Obviously, I am working to profile the app and reduce the garbage load >>>>>> in parallel. But if I still see this sort of problem, it is only a >>>>>> question of how long must the app run before I see unacceptable >>>>>> latency spikes. >>>>>> >>>>>> Matt >>>>>> >>>>>> ________________________________ >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From matt.fowles at gmail.com Fri May 14 11:30:14 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 14:30:14 -0400 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BED8121.5000405@oracle.com> <4BED871A.4010306@oracle.com> Message-ID: Ramki~ File 2. Matt On Fri, May 14, 2010 at 2:29 PM, Matt Fowles wrote: > Ramki~ > > Attached are 3 different runs with slightly tweaked VM settings based > on suggestions from this list and others. > > All of them have reduced the MaxTenuringThreshold to 2. > > gc1.log reduces the young gen size to 1g and the old gen size to 7g > initially. ?As you can see from it, the young gen sweep speed does not > improve after the CMS sweep that occurs part way through the run. > gc2.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to > the settings from gc1.log > gc3.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to > a 4g young gen with the original 32g total heap. > > Due to the quirks of the infrastructure running these tests, data > volumes (and hence allocation rates) are NOT comparable between runs. > Because the tests take ~4 hours to run, we run several tests in > parallel against different sources of data. > > Matt > > PS - Due to attachment size restrictions I am going to send each file > in its own email. > > On Fri, May 14, 2010 at 1:23 PM, Y. Srinivas Ramakrishna > wrote: >> On 05/14/10 10:07, Matt Fowles wrote: >>> >>> Ramki~ >>> >>> The machine has 4 cpus each of which have 4 cores. ?I will adjust the >> >> Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with >> -XX:-UseParNewGC. if it's the kind of fragmentation that we >> believe may be the cause here, you'd see larger gc times in the >> latter case but they would not increase as they do now. But that >> is conjecture at this point. >> >>> survivor spaces as you suggest. ?Previously I had been running with >>> MTT 0, but change it to 4 at the suggestion of others. >> >> MTT=0 can give very poor performance, as people said MTT=4 >> would definitely be better here than MTT=0. >> You should use MTT=1 here though. >> >>> >>> Running with the JDK7 version may take a bit of time, but I will >>> pursue that as well. >> >> All you should do is pull the libjvm.so that is in the JDK 7 installation >> (or bundle) and plonk it down into the appropriate directory of your >> existing JDK 6u20 installation. We just want to see the results with >> the latest JVM which includes a fix for 6631166. >> >> I attached a very rough plot of some metrics extracted from your log >> and this behaviour is definitely deserving of a bug, especially >> if it can be shown that it happens in the latest JVM. In the plot: >> >> ?red: scavenge durations >> ?dark blue: promoted data per scavenge >> ?pink: data in survivor space following scavenge >> ?light blue: live data in old gen >> >> As you can see the scavenge clearly correlates with the >> occupancy of the old gen (as Jon and others indicated). >> Did you try Jon's suggestion of doing a manual GC at that >> point via jconsole, and seeing if the upward trend of >> scavenges continues beyond that? >> >> Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch? >> >> Do you have an easily used test case that you can share with us via >> your support channels? If/when you do so, please copy me and >> send them a reference to this thread on this mailing list. >> >> later, with your new data. >> -- ramki >> >>> >>> Matt >>> >>> >>> >>> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna >>> wrote: >>>> >>>> Hi Matt -- i am computing some metrics from yr log file >>>> and would like to know how many cpu's you have for the logs below? >>>> >>>> Also, as you noted, almost anything that survives a scavenge >>>> lives for a while. To reduce the overhead of unnecessary >>>> back-and-forth copying in the survivor spaces, just use >>>> MaxTenuringThreshold=1 (This suggestion was also made by >>>> several others in the thread, and is corroborated by your >>>> PrintTenuringDistribution data). Since you have farily large survivor >>>> spaces configured now, (at least large enough to fit 4 age cohorts, >>>> which will be down to 1 age cohort if you use MTT=1), i'd >>>> suggest making your surviror spaces smaller, may be down to >>>> about 64 MB from the current 420 MB each, and give the excess >>>> to your Eden space. >>>> >>>> Then use 6u21 when it comes out (or ask your Java support to >>>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into >>>> your 6u20 installation, and run with that. If you still see >>>> rising pause times let me know or file a bug, and send us the >>>> log file and JVM options along with full platform information. >>>> >>>> I'll run some metrics from yr log file if you send me the info >>>> re platform above, and that may perhaps reveal a few more secrets. >>>> >>>> later. >>>> -- ramki >>>> >>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>> >>>>> All~ >>>>> >>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>> am trying to reduce the size of gc outliers. ?About 99% of this data >>>>> is garbage, but almost anything that survives one collection survives >>>>> for an indeterminately long amount of time. ?We are currently using >>>>> the following VM and options: >>>>> >>>>> java version "1.6.0_20" >>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>> >>>>> ? ? ? ? ? ? ?-verbose:gc >>>>> ? ? ? ? ? ? ?-XX:+PrintGCTimeStamps >>>>> ? ? ? ? ? ? ?-XX:+PrintGCDetails >>>>> ? ? ? ? ? ? ?-XX:+PrintGCTaskTimeStamps >>>>> ? ? ? ? ? ? ?-XX:+PrintTenuringDistribution >>>>> ? ? ? ? ? ? ?-XX:+PrintCommandLineFlags >>>>> ? ? ? ? ? ? ?-XX:+PrintReferenceGC >>>>> ? ? ? ? ? ? ?-Xms32g -Xmx32g -Xmn4g >>>>> ? ? ? ? ? ? ?-XX:+UseParNewGC >>>>> ? ? ? ? ? ? ?-XX:ParallelGCThreads=4 >>>>> ? ? ? ? ? ? ?-XX:+UseConcMarkSweepGC >>>>> ? ? ? ? ? ? ?-XX:ParallelCMSThreads=4 >>>>> ? ? ? ? ? ? ?-XX:CMSInitiatingOccupancyFraction=60 >>>>> ? ? ? ? ? ? ?-XX:+UseCMSInitiatingOccupancyOnly >>>>> ? ? ? ? ? ? ?-XX:+CMSParallelRemarkEnabled >>>>> ? ? ? ? ? ? ?-XX:MaxGCPauseMillis=50 >>>>> ? ? ? ? ? ? ?-Xloggc:gc.log >>>>> >>>>> >>>>> As you can see from the GC log, we never actually reach the point >>>>> where the CMS kicks in (after app startup). ?But our young gens seem >>>>> to take increasingly long to collect as time goes by. >>>>> >>>>> The steady state of the app is reached around 956.392 into the log >>>>> with a collection that takes 0.106 seconds. ?Thereafter the survivor >>>>> space remains roughly constantly as filled and the amount promoted to >>>>> old gen also remains constant, but the collection times increase to >>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>> >>>>> Has anyone seen this sort of behavior before? ?Are there more switches >>>>> that I should try running with? >>>>> >>>>> Obviously, I am working to profile the app and reduce the garbage load >>>>> in parallel. ?But if I still see this sort of problem, it is only a >>>>> question of how long must the app run before I see unacceptable >>>>> latency spikes. >>>>> >>>>> Matt >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: r229-vm-tweaks-gc2.log.bz2 Type: application/x-bzip2 Size: 30955 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/2a476a93/attachment-0001.bin From matt.fowles at gmail.com Fri May 14 11:30:30 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 14:30:30 -0400 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BED8121.5000405@oracle.com> <4BED871A.4010306@oracle.com> Message-ID: Ramki~ File 3. Matt On Fri, May 14, 2010 at 2:30 PM, Matt Fowles wrote: > Ramki~ > > File 2. > > Matt > > On Fri, May 14, 2010 at 2:29 PM, Matt Fowles wrote: >> Ramki~ >> >> Attached are 3 different runs with slightly tweaked VM settings based >> on suggestions from this list and others. >> >> All of them have reduced the MaxTenuringThreshold to 2. >> >> gc1.log reduces the young gen size to 1g and the old gen size to 7g >> initially. ?As you can see from it, the young gen sweep speed does not >> improve after the CMS sweep that occurs part way through the run. >> gc2.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to >> the settings from gc1.log >> gc3.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to >> a 4g young gen with the original 32g total heap. >> >> Due to the quirks of the infrastructure running these tests, data >> volumes (and hence allocation rates) are NOT comparable between runs. >> Because the tests take ~4 hours to run, we run several tests in >> parallel against different sources of data. >> >> Matt >> >> PS - Due to attachment size restrictions I am going to send each file >> in its own email. >> >> On Fri, May 14, 2010 at 1:23 PM, Y. Srinivas Ramakrishna >> wrote: >>> On 05/14/10 10:07, Matt Fowles wrote: >>>> >>>> Ramki~ >>>> >>>> The machine has 4 cpus each of which have 4 cores. ?I will adjust the >>> >>> Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with >>> -XX:-UseParNewGC. if it's the kind of fragmentation that we >>> believe may be the cause here, you'd see larger gc times in the >>> latter case but they would not increase as they do now. But that >>> is conjecture at this point. >>> >>>> survivor spaces as you suggest. ?Previously I had been running with >>>> MTT 0, but change it to 4 at the suggestion of others. >>> >>> MTT=0 can give very poor performance, as people said MTT=4 >>> would definitely be better here than MTT=0. >>> You should use MTT=1 here though. >>> >>>> >>>> Running with the JDK7 version may take a bit of time, but I will >>>> pursue that as well. >>> >>> All you should do is pull the libjvm.so that is in the JDK 7 installation >>> (or bundle) and plonk it down into the appropriate directory of your >>> existing JDK 6u20 installation. We just want to see the results with >>> the latest JVM which includes a fix for 6631166. >>> >>> I attached a very rough plot of some metrics extracted from your log >>> and this behaviour is definitely deserving of a bug, especially >>> if it can be shown that it happens in the latest JVM. In the plot: >>> >>> ?red: scavenge durations >>> ?dark blue: promoted data per scavenge >>> ?pink: data in survivor space following scavenge >>> ?light blue: live data in old gen >>> >>> As you can see the scavenge clearly correlates with the >>> occupancy of the old gen (as Jon and others indicated). >>> Did you try Jon's suggestion of doing a manual GC at that >>> point via jconsole, and seeing if the upward trend of >>> scavenges continues beyond that? >>> >>> Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch? >>> >>> Do you have an easily used test case that you can share with us via >>> your support channels? If/when you do so, please copy me and >>> send them a reference to this thread on this mailing list. >>> >>> later, with your new data. >>> -- ramki >>> >>>> >>>> Matt >>>> >>>> >>>> >>>> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna >>>> wrote: >>>>> >>>>> Hi Matt -- i am computing some metrics from yr log file >>>>> and would like to know how many cpu's you have for the logs below? >>>>> >>>>> Also, as you noted, almost anything that survives a scavenge >>>>> lives for a while. To reduce the overhead of unnecessary >>>>> back-and-forth copying in the survivor spaces, just use >>>>> MaxTenuringThreshold=1 (This suggestion was also made by >>>>> several others in the thread, and is corroborated by your >>>>> PrintTenuringDistribution data). Since you have farily large survivor >>>>> spaces configured now, (at least large enough to fit 4 age cohorts, >>>>> which will be down to 1 age cohort if you use MTT=1), i'd >>>>> suggest making your surviror spaces smaller, may be down to >>>>> about 64 MB from the current 420 MB each, and give the excess >>>>> to your Eden space. >>>>> >>>>> Then use 6u21 when it comes out (or ask your Java support to >>>>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into >>>>> your 6u20 installation, and run with that. If you still see >>>>> rising pause times let me know or file a bug, and send us the >>>>> log file and JVM options along with full platform information. >>>>> >>>>> I'll run some metrics from yr log file if you send me the info >>>>> re platform above, and that may perhaps reveal a few more secrets. >>>>> >>>>> later. >>>>> -- ramki >>>>> >>>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>>> >>>>>> All~ >>>>>> >>>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>>> am trying to reduce the size of gc outliers. ?About 99% of this data >>>>>> is garbage, but almost anything that survives one collection survives >>>>>> for an indeterminately long amount of time. ?We are currently using >>>>>> the following VM and options: >>>>>> >>>>>> java version "1.6.0_20" >>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>>> >>>>>> ? ? ? ? ? ? ?-verbose:gc >>>>>> ? ? ? ? ? ? ?-XX:+PrintGCTimeStamps >>>>>> ? ? ? ? ? ? ?-XX:+PrintGCDetails >>>>>> ? ? ? ? ? ? ?-XX:+PrintGCTaskTimeStamps >>>>>> ? ? ? ? ? ? ?-XX:+PrintTenuringDistribution >>>>>> ? ? ? ? ? ? ?-XX:+PrintCommandLineFlags >>>>>> ? ? ? ? ? ? ?-XX:+PrintReferenceGC >>>>>> ? ? ? ? ? ? ?-Xms32g -Xmx32g -Xmn4g >>>>>> ? ? ? ? ? ? ?-XX:+UseParNewGC >>>>>> ? ? ? ? ? ? ?-XX:ParallelGCThreads=4 >>>>>> ? ? ? ? ? ? ?-XX:+UseConcMarkSweepGC >>>>>> ? ? ? ? ? ? ?-XX:ParallelCMSThreads=4 >>>>>> ? ? ? ? ? ? ?-XX:CMSInitiatingOccupancyFraction=60 >>>>>> ? ? ? ? ? ? ?-XX:+UseCMSInitiatingOccupancyOnly >>>>>> ? ? ? ? ? ? ?-XX:+CMSParallelRemarkEnabled >>>>>> ? ? ? ? ? ? ?-XX:MaxGCPauseMillis=50 >>>>>> ? ? ? ? ? ? ?-Xloggc:gc.log >>>>>> >>>>>> >>>>>> As you can see from the GC log, we never actually reach the point >>>>>> where the CMS kicks in (after app startup). ?But our young gens seem >>>>>> to take increasingly long to collect as time goes by. >>>>>> >>>>>> The steady state of the app is reached around 956.392 into the log >>>>>> with a collection that takes 0.106 seconds. ?Thereafter the survivor >>>>>> space remains roughly constantly as filled and the amount promoted to >>>>>> old gen also remains constant, but the collection times increase to >>>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>>> >>>>>> Has anyone seen this sort of behavior before? ?Are there more switches >>>>>> that I should try running with? >>>>>> >>>>>> Obviously, I am working to profile the app and reduce the garbage load >>>>>> in parallel. ?But if I still see this sort of problem, it is only a >>>>>> question of how long must the app run before I see unacceptable >>>>>> latency spikes. >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: r229-vm-tweaks-gc3.log.bz2 Type: application/x-bzip2 Size: 9269 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/47a15a46/attachment.bin From matt.fowles at gmail.com Fri May 14 11:29:55 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 14:29:55 -0400 Subject: Growing GC Young Gen Times In-Reply-To: <4BED871A.4010306@oracle.com> References: <4BED8121.5000405@oracle.com> <4BED871A.4010306@oracle.com> Message-ID: Ramki~ Attached are 3 different runs with slightly tweaked VM settings based on suggestions from this list and others. All of them have reduced the MaxTenuringThreshold to 2. gc1.log reduces the young gen size to 1g and the old gen size to 7g initially. As you can see from it, the young gen sweep speed does not improve after the CMS sweep that occurs part way through the run. gc2.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to the settings from gc1.log gc3.log adds the -XX:+UseLargePages and -XX:+AlwaysPreTouch options to a 4g young gen with the original 32g total heap. Due to the quirks of the infrastructure running these tests, data volumes (and hence allocation rates) are NOT comparable between runs. Because the tests take ~4 hours to run, we run several tests in parallel against different sources of data. Matt PS - Due to attachment size restrictions I am going to send each file in its own email. On Fri, May 14, 2010 at 1:23 PM, Y. Srinivas Ramakrishna wrote: > On 05/14/10 10:07, Matt Fowles wrote: >> >> Ramki~ >> >> The machine has 4 cpus each of which have 4 cores. ?I will adjust the > > Great, thanks. I'd suggest make ParallelGCThreads=8. Also compare with > -XX:-UseParNewGC. if it's the kind of fragmentation that we > believe may be the cause here, you'd see larger gc times in the > latter case but they would not increase as they do now. But that > is conjecture at this point. > >> survivor spaces as you suggest. ?Previously I had been running with >> MTT 0, but change it to 4 at the suggestion of others. > > MTT=0 can give very poor performance, as people said MTT=4 > would definitely be better here than MTT=0. > You should use MTT=1 here though. > >> >> Running with the JDK7 version may take a bit of time, but I will >> pursue that as well. > > All you should do is pull the libjvm.so that is in the JDK 7 installation > (or bundle) and plonk it down into the appropriate directory of your > existing JDK 6u20 installation. We just want to see the results with > the latest JVM which includes a fix for 6631166. > > I attached a very rough plot of some metrics extracted from your log > and this behaviour is definitely deserving of a bug, especially > if it can be shown that it happens in the latest JVM. In the plot: > > ?red: scavenge durations > ?dark blue: promoted data per scavenge > ?pink: data in survivor space following scavenge > ?light blue: live data in old gen > > As you can see the scavenge clearly correlates with the > occupancy of the old gen (as Jon and others indicated). > Did you try Jon's suggestion of doing a manual GC at that > point via jconsole, and seeing if the upward trend of > scavenges continues beyond that? > > Did you use -XX:+UseLargePages and -XX:+AlwaysPreTouch? > > Do you have an easily used test case that you can share with us via > your support channels? If/when you do so, please copy me and > send them a reference to this thread on this mailing list. > > later, with your new data. > -- ramki > >> >> Matt >> >> >> >> On Fri, May 14, 2010 at 12:58 PM, Y. Srinivas Ramakrishna >> wrote: >>> >>> Hi Matt -- i am computing some metrics from yr log file >>> and would like to know how many cpu's you have for the logs below? >>> >>> Also, as you noted, almost anything that survives a scavenge >>> lives for a while. To reduce the overhead of unnecessary >>> back-and-forth copying in the survivor spaces, just use >>> MaxTenuringThreshold=1 (This suggestion was also made by >>> several others in the thread, and is corroborated by your >>> PrintTenuringDistribution data). Since you have farily large survivor >>> spaces configured now, (at least large enough to fit 4 age cohorts, >>> which will be down to 1 age cohort if you use MTT=1), i'd >>> suggest making your surviror spaces smaller, may be down to >>> about 64 MB from the current 420 MB each, and give the excess >>> to your Eden space. >>> >>> Then use 6u21 when it comes out (or ask your Java support to >>> send you a 6u21 for a beta test), or drop in a JVM from JDK 7 into >>> your 6u20 installation, and run with that. If you still see >>> rising pause times let me know or file a bug, and send us the >>> log file and JVM options along with full platform information. >>> >>> I'll run some metrics from yr log file if you send me the info >>> re platform above, and that may perhaps reveal a few more secrets. >>> >>> later. >>> -- ramki >>> >>> On 05/12/10 15:19, Matt Fowles wrote: >>>> >>>> All~ >>>> >>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>> am trying to reduce the size of gc outliers. ?About 99% of this data >>>> is garbage, but almost anything that survives one collection survives >>>> for an indeterminately long amount of time. ?We are currently using >>>> the following VM and options: >>>> >>>> java version "1.6.0_20" >>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>> >>>> ? ? ? ? ? ? ?-verbose:gc >>>> ? ? ? ? ? ? ?-XX:+PrintGCTimeStamps >>>> ? ? ? ? ? ? ?-XX:+PrintGCDetails >>>> ? ? ? ? ? ? ?-XX:+PrintGCTaskTimeStamps >>>> ? ? ? ? ? ? ?-XX:+PrintTenuringDistribution >>>> ? ? ? ? ? ? ?-XX:+PrintCommandLineFlags >>>> ? ? ? ? ? ? ?-XX:+PrintReferenceGC >>>> ? ? ? ? ? ? ?-Xms32g -Xmx32g -Xmn4g >>>> ? ? ? ? ? ? ?-XX:+UseParNewGC >>>> ? ? ? ? ? ? ?-XX:ParallelGCThreads=4 >>>> ? ? ? ? ? ? ?-XX:+UseConcMarkSweepGC >>>> ? ? ? ? ? ? ?-XX:ParallelCMSThreads=4 >>>> ? ? ? ? ? ? ?-XX:CMSInitiatingOccupancyFraction=60 >>>> ? ? ? ? ? ? ?-XX:+UseCMSInitiatingOccupancyOnly >>>> ? ? ? ? ? ? ?-XX:+CMSParallelRemarkEnabled >>>> ? ? ? ? ? ? ?-XX:MaxGCPauseMillis=50 >>>> ? ? ? ? ? ? ?-Xloggc:gc.log >>>> >>>> >>>> As you can see from the GC log, we never actually reach the point >>>> where the CMS kicks in (after app startup). ?But our young gens seem >>>> to take increasingly long to collect as time goes by. >>>> >>>> The steady state of the app is reached around 956.392 into the log >>>> with a collection that takes 0.106 seconds. ?Thereafter the survivor >>>> space remains roughly constantly as filled and the amount promoted to >>>> old gen also remains constant, but the collection times increase to >>>> 2.855 seconds by the end of the 3.5 hour run. >>>> >>>> Has anyone seen this sort of behavior before? ?Are there more switches >>>> that I should try running with? >>>> >>>> Obviously, I am working to profile the app and reduce the garbage load >>>> in parallel. ?But if I still see this sort of problem, it is only a >>>> question of how long must the app run before I see unacceptable >>>> latency spikes. >>>> >>>> Matt >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: r229-vm-tweaks-gc1.log.bz2 Type: application/x-bzip2 Size: 45158 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/34386d25/attachment-0001.bin From matt.fowles at gmail.com Fri May 14 12:24:11 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 14 May 2010 15:24:11 -0400 Subject: Growing GC Young Gen Times In-Reply-To: <4BED8BF9.7000803@oracle.com> References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> <4BED8A17.9090208@oracle.com> <4BED8BF9.7000803@oracle.com> Message-ID: Ramki~ I am preparing the flags for the next 3 runs (which run in parallel) and wanted to check a few things with you. I believe that each of these is collecting a useful data point, Server 1 is running with 8 threads, reduced young gen, and MTT 1. Server 2 is running with 8 threads, reduced young gen, and MTT 1, ParNew, but NOT CMS. Server 3 is running with 8 threads, reduced young gen, and MTT 1, and PrintFLSStatistics. I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM) 64-Bit Server VM (build 17.0-b05, mixed mode)). Server 1: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintCommandLineFlags -Xms32g -Xmx32g -Xmn1g -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=8 -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=14 -XX:+CMSParallelRemarkEnabled -Xloggc:gc1.log -XX:+UseLargePages -XX:+AlwaysPreTouch Server 2: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintCommandLineFlags -Xms32g -Xmx32g -Xmn1g -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=14 -Xloggc:gc2.log -XX:+UseLargePages -XX:+AlwaysPreTouch Server 3: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintCommandLineFlags -Xms32g -Xmx32g -Xmn1g -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=8 -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=14 -XX:+CMSParallelRemarkEnabled -Xloggc:gc3.log -XX:PrintFLSStatistics=2 -XX:+UseLargePages -XX:+AlwaysPreTouch Matt On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote: >> >> On 05/14/10 10:24, Matt Fowles wrote: >>> >>> Jon~ >>> >>> That makes, sense but the fact is that the old gen *never* get >>> collected. So all the allocations happen from the giant empty space >>> at the end of the free list. I thought fragmentation only occurred >>> when the free lists are added to after freeing memory... >> >> As Jon indicated allocation is done from free lists of blocks >> that are pre-carved on demand to avoid contention while allocating. >> The old heuristics for how large to make those lists and the >> inventory to hold in those lists was not working well as you >> scaled the number of workers. Following 6631166 we believe it >> works better and causes both less contention and less >> fragmentation than it did before, because we do not hold >> unnecessary excess inventory of free blocks. > > To see what the fragmentation is, try -XX:PrintFLSStatistics=2. > This will slow down your scavenge pauses (perhaps by quite a bit > for your 26 GB heap), but you will get a report of the number of > blocks on free lists and how fragmented the space is on that ccount > (for some appropriate notion of fragmentation). Don't use that > flag in production though :-) > > -- ramki > >> >> The fragmentation in turn causes card-scanning to suffer >> adversely, besides the issues with loss of spatial locality also >> increasing cache misses and TLB misses. (The large page >> option might help mitigate the latter a bit, especially >> since you have such a large heap and our fragmented >> allocation may be exacerbating the TLB pressure.) >> >> -- ramki >> >>> Matt >>> >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu >>> wrote: >>>> >>>> Matt, >>>> >>>> To amplify on Ramki's comment, the allocations out of the >>>> old generation are always from a free list. During a young >>>> generation collection each GC thread will get its own >>>> local free lists from the old generation so that it can >>>> copy objects to the old generation without synchronizing >>>> with the other GC thread (most of the time). Objects from >>>> a GC thread's local free lists are pushed to the globals lists >>>> after the collection (as far as I recall). So there is some >>>> churn in the free lists. >>>> >>>> Jon >>>> >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: >>>>> >>>>> On 05/13/10 10:50, Matt Fowles wrote: >>>>>> >>>>>> Jon~ >>>>>> >>>>>> This may sound naive, but how can fragmentation be an issue if the old >>>>>> gen has never been collected? I would think we are still in the space >>>>>> where we can just bump the old gen alloc pointer... >>>>> >>>>> Matt, The old gen allocator may fragment the space. Allocation is not >>>>> exactly "bump a pointer". >>>>> >>>>> -- ramki >>>>> >>>>>> Matt >>>>>> >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu >>>>>> wrote: >>>>>>> >>>>>>> Matt, >>>>>>> >>>>>>> As Ramki indicated fragmentation might be an issue. As the >>>>>>> fragmentation >>>>>>> in the old generation increases, it takes longer to find space in the >>>>>>> old >>>>>>> generation >>>>>>> into which to promote objects from the young generation. This is >>>>>>> apparently >>>>>>> not >>>>>>> the problem that Wayne is having but you still might be hitting it. >>>>>>> If >>>>>>> you >>>>>>> can >>>>>>> connect jconsole to the VM and force a full GC, that would tell us if >>>>>>> it's >>>>>>> fragmentation. >>>>>>> >>>>>>> There might be a scaling issue with the UseParNewGC. If you can use >>>>>>> -XX:-UseParNewGC (turning off the parallel young >>>>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses >>>>>>> will be longer but may be more stable. That's not the solution but >>>>>>> just >>>>>>> part >>>>>>> of the investigation. >>>>>>> >>>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC >>>>>>> and if you don't see the growing young generation pause, that would >>>>>>> indicate >>>>>>> something specific about promotion into the CMS generation. >>>>>>> >>>>>>> UseParallelGC is different from UseParNewGC in a number of ways >>>>>>> and if you try UseParallelGC and still see the growing young >>>>>>> generation >>>>>>> pauses, I'd suspect something special about your application. >>>>>>> >>>>>>> If you can run these experiments hopefully they will tell >>>>>>> us where to look next. >>>>>>> >>>>>>> Jon >>>>>>> >>>>>>> >>>>>>> On 05/12/10 15:19, Matt Fowles wrote: >>>>>>> >>>>>>> All~ >>>>>>> >>>>>>> I have a large app that produces ~4g of garbage every 30 seconds and >>>>>>> am trying to reduce the size of gc outliers. About 99% of this data >>>>>>> is garbage, but almost anything that survives one collection survives >>>>>>> for an indeterminately long amount of time. We are currently using >>>>>>> the following VM and options: >>>>>>> >>>>>>> java version "1.6.0_20" >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>>>>>> >>>>>>> -verbose:gc >>>>>>> -XX:+PrintGCTimeStamps >>>>>>> -XX:+PrintGCDetails >>>>>>> -XX:+PrintGCTaskTimeStamps >>>>>>> -XX:+PrintTenuringDistribution >>>>>>> -XX:+PrintCommandLineFlags >>>>>>> -XX:+PrintReferenceGC >>>>>>> -Xms32g -Xmx32g -Xmn4g >>>>>>> -XX:+UseParNewGC >>>>>>> -XX:ParallelGCThreads=4 >>>>>>> -XX:+UseConcMarkSweepGC >>>>>>> -XX:ParallelCMSThreads=4 >>>>>>> -XX:CMSInitiatingOccupancyFraction=60 >>>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>>> -XX:+CMSParallelRemarkEnabled >>>>>>> -XX:MaxGCPauseMillis=50 >>>>>>> -Xloggc:gc.log >>>>>>> >>>>>>> >>>>>>> As you can see from the GC log, we never actually reach the point >>>>>>> where the CMS kicks in (after app startup). But our young gens seem >>>>>>> to take increasingly long to collect as time goes by. >>>>>>> >>>>>>> The steady state of the app is reached around 956.392 into the log >>>>>>> with a collection that takes 0.106 seconds. Thereafter the survivor >>>>>>> space remains roughly constantly as filled and the amount promoted to >>>>>>> old gen also remains constant, but the collection times increase to >>>>>>> 2.855 seconds by the end of the 3.5 hour run. >>>>>>> >>>>>>> Has anyone seen this sort of behavior before? Are there more >>>>>>> switches >>>>>>> that I should try running with? >>>>>>> >>>>>>> Obviously, I am working to profile the app and reduce the garbage >>>>>>> load >>>>>>> in parallel. But if I still see this sort of problem, it is only a >>>>>>> question of how long must the app run before I see unacceptable >>>>>>> latency spikes. >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> ________________________________ >>>>>>> _______________________________________________ >>>>>>> hotspot-gc-use mailing list >>>>>>> hotspot-gc-use at openjdk.java.net >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>>> >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100514/7d593fc0/attachment.html From y.s.ramakrishna at oracle.com Fri May 14 12:32:45 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Fri, 14 May 2010 12:32:45 -0700 Subject: Growing GC Young Gen Times In-Reply-To: References: <4BEC2776.8010609@oracle.com> <4BEC7498.6030405@oracle.com> <4BEC7D4D.2000905@oracle.com> <4BED8A17.9090208@oracle.com> <4BED8BF9.7000803@oracle.com> Message-ID: <4BEDA55D.5030703@oracle.com> Matt -- Yes, comparative data for all these for 6u20 and jdk 7 would be great. Naturally, server 1 is most immediately useful for determining if 6631166 addresses this at all, but others would be useful too if it turns out it doesn't (i.e. if jdk 7's server 1 turns out to be no better than 6u20's -- at which point we should get this into the right channel -- open a bug, and a support case). thanks. -- ramki On 05/14/10 12:24, Matt Fowles wrote: > Ramki~ > > I am preparing the flags for the next 3 runs (which run in parallel) and > wanted to check a few things with you. I believe that each of these is > collecting a useful data point, > > Server 1 is running with 8 threads, reduced young gen, and MTT 1. > Server 2 is running with 8 threads, reduced young gen, and MTT 1, > ParNew, but NOT CMS. > Server 3 is running with 8 threads, reduced young gen, and MTT 1, > and PrintFLSStatistics. > > I can (additionally) run all of these tests on JDK7 (Java HotSpot(TM) > 64-Bit Server VM (build 17.0-b05, mixed mode)). > > Server 1: > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintCommandLineFlags > > -Xms32g -Xmx32g -Xmn1g > -XX:+UseParNewGC > -XX:ParallelGCThreads=8 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=8 > -XX:MaxTenuringThreshold=1 > -XX:SurvivorRatio=14 > -XX:+CMSParallelRemarkEnabled > -Xloggc:gc1.log > -XX:+UseLargePages > -XX:+AlwaysPreTouch > > Server 2: > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintCommandLineFlags > > -Xms32g -Xmx32g -Xmn1g > -XX:+UseParNewGC > -XX:ParallelGCThreads=8 > -XX:MaxTenuringThreshold=1 > -XX:SurvivorRatio=14 > -Xloggc:gc2.log > -XX:+UseLargePages > -XX:+AlwaysPreTouch > > > Server 3: > -verbose:gc > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintGCTaskTimeStamps > -XX:+PrintCommandLineFlags > > -Xms32g -Xmx32g -Xmn1g > -XX:+UseParNewGC > -XX:ParallelGCThreads=8 > -XX:+UseConcMarkSweepGC > -XX:ParallelCMSThreads=8 > -XX:MaxTenuringThreshold=1 > -XX:SurvivorRatio=14 > -XX:+CMSParallelRemarkEnabled > -Xloggc:gc3.log > > -XX:PrintFLSStatistics=2 > -XX:+UseLargePages > -XX:+AlwaysPreTouch > > Matt > > On Fri, May 14, 2010 at 1:44 PM, Y. Srinivas Ramakrishna > > wrote: > > On 05/14/10 10:36, Y. Srinivas Ramakrishna wrote: > >> > >> On 05/14/10 10:24, Matt Fowles wrote: > >>> > >>> Jon~ > >>> > >>> That makes, sense but the fact is that the old gen *never* get > >>> collected. So all the allocations happen from the giant empty space > >>> at the end of the free list. I thought fragmentation only occurred > >>> when the free lists are added to after freeing memory... > >> > >> As Jon indicated allocation is done from free lists of blocks > >> that are pre-carved on demand to avoid contention while allocating. > >> The old heuristics for how large to make those lists and the > >> inventory to hold in those lists was not working well as you > >> scaled the number of workers. Following 6631166 we believe it > >> works better and causes both less contention and less > >> fragmentation than it did before, because we do not hold > >> unnecessary excess inventory of free blocks. > > > > To see what the fragmentation is, try -XX:PrintFLSStatistics=2. > > This will slow down your scavenge pauses (perhaps by quite a bit > > for your 26 GB heap), but you will get a report of the number of > > blocks on free lists and how fragmented the space is on that ccount > > (for some appropriate notion of fragmentation). Don't use that > > flag in production though :-) > > > > -- ramki > > > >> > >> The fragmentation in turn causes card-scanning to suffer > >> adversely, besides the issues with loss of spatial locality also > >> increasing cache misses and TLB misses. (The large page > >> option might help mitigate the latter a bit, especially > >> since you have such a large heap and our fragmented > >> allocation may be exacerbating the TLB pressure.) > >> > >> -- ramki > >> > >>> Matt > >>> > >>> On Thu, May 13, 2010 at 6:29 PM, Jon Masamitsu > > > >>> wrote: > >>>> > >>>> Matt, > >>>> > >>>> To amplify on Ramki's comment, the allocations out of the > >>>> old generation are always from a free list. During a young > >>>> generation collection each GC thread will get its own > >>>> local free lists from the old generation so that it can > >>>> copy objects to the old generation without synchronizing > >>>> with the other GC thread (most of the time). Objects from > >>>> a GC thread's local free lists are pushed to the globals lists > >>>> after the collection (as far as I recall). So there is some > >>>> churn in the free lists. > >>>> > >>>> Jon > >>>> > >>>> On 05/13/10 14:52, Y. Srinivas Ramakrishna wrote: > >>>>> > >>>>> On 05/13/10 10:50, Matt Fowles wrote: > >>>>>> > >>>>>> Jon~ > >>>>>> > >>>>>> This may sound naive, but how can fragmentation be an issue if > the old > >>>>>> gen has never been collected? I would think we are still in the > space > >>>>>> where we can just bump the old gen alloc pointer... > >>>>> > >>>>> Matt, The old gen allocator may fragment the space. Allocation is not > >>>>> exactly "bump a pointer". > >>>>> > >>>>> -- ramki > >>>>> > >>>>>> Matt > >>>>>> > >>>>>> On Thu, May 13, 2010 at 12:23 PM, Jon Masamitsu > >>>>>> > wrote: > >>>>>>> > >>>>>>> Matt, > >>>>>>> > >>>>>>> As Ramki indicated fragmentation might be an issue. As the > >>>>>>> fragmentation > >>>>>>> in the old generation increases, it takes longer to find space > in the > >>>>>>> old > >>>>>>> generation > >>>>>>> into which to promote objects from the young generation. This is > >>>>>>> apparently > >>>>>>> not > >>>>>>> the problem that Wayne is having but you still might be hitting it. > >>>>>>> If > >>>>>>> you > >>>>>>> can > >>>>>>> connect jconsole to the VM and force a full GC, that would tell > us if > >>>>>>> it's > >>>>>>> fragmentation. > >>>>>>> > >>>>>>> There might be a scaling issue with the UseParNewGC. If you > can use > >>>>>>> -XX:-UseParNewGC (turning off the parallel young > >>>>>>> generation collection) with -XX:+UseConcMarkSweepGC the pauses > >>>>>>> will be longer but may be more stable. That's not the solution but > >>>>>>> just > >>>>>>> part > >>>>>>> of the investigation. > >>>>>>> > >>>>>>> You could try just -XX:+UseParNewGC without -XX:+UseConcMarkSweepGC > >>>>>>> and if you don't see the growing young generation pause, that would > >>>>>>> indicate > >>>>>>> something specific about promotion into the CMS generation. > >>>>>>> > >>>>>>> UseParallelGC is different from UseParNewGC in a number of ways > >>>>>>> and if you try UseParallelGC and still see the growing young > >>>>>>> generation > >>>>>>> pauses, I'd suspect something special about your application. > >>>>>>> > >>>>>>> If you can run these experiments hopefully they will tell > >>>>>>> us where to look next. > >>>>>>> > >>>>>>> Jon > >>>>>>> > >>>>>>> > >>>>>>> On 05/12/10 15:19, Matt Fowles wrote: > >>>>>>> > >>>>>>> All~ > >>>>>>> > >>>>>>> I have a large app that produces ~4g of garbage every 30 > seconds and > >>>>>>> am trying to reduce the size of gc outliers. About 99% of this > data > >>>>>>> is garbage, but almost anything that survives one collection > survives > >>>>>>> for an indeterminately long amount of time. We are currently using > >>>>>>> the following VM and options: > >>>>>>> > >>>>>>> java version "1.6.0_20" > >>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > >>>>>>> > >>>>>>> -verbose:gc > >>>>>>> -XX:+PrintGCTimeStamps > >>>>>>> -XX:+PrintGCDetails > >>>>>>> -XX:+PrintGCTaskTimeStamps > >>>>>>> -XX:+PrintTenuringDistribution > >>>>>>> -XX:+PrintCommandLineFlags > >>>>>>> -XX:+PrintReferenceGC > >>>>>>> -Xms32g -Xmx32g -Xmn4g > >>>>>>> -XX:+UseParNewGC > >>>>>>> -XX:ParallelGCThreads=4 > >>>>>>> -XX:+UseConcMarkSweepGC > >>>>>>> -XX:ParallelCMSThreads=4 > >>>>>>> -XX:CMSInitiatingOccupancyFraction=60 > >>>>>>> -XX:+UseCMSInitiatingOccupancyOnly > >>>>>>> -XX:+CMSParallelRemarkEnabled > >>>>>>> -XX:MaxGCPauseMillis=50 > >>>>>>> -Xloggc:gc.log > >>>>>>> > >>>>>>> > >>>>>>> As you can see from the GC log, we never actually reach the point > >>>>>>> where the CMS kicks in (after app startup). But our young gens > seem > >>>>>>> to take increasingly long to collect as time goes by. > >>>>>>> > >>>>>>> The steady state of the app is reached around 956.392 into the log > >>>>>>> with a collection that takes 0.106 seconds. Thereafter the > survivor > >>>>>>> space remains roughly constantly as filled and the amount > promoted to > >>>>>>> old gen also remains constant, but the collection times increase to > >>>>>>> 2.855 seconds by the end of the 3.5 hour run. > >>>>>>> > >>>>>>> Has anyone seen this sort of behavior before? Are there more > >>>>>>> switches > >>>>>>> that I should try running with? > >>>>>>> > >>>>>>> Obviously, I am working to profile the app and reduce the garbage > >>>>>>> load > >>>>>>> in parallel. But if I still see this sort of problem, it is only a > >>>>>>> question of how long must the app run before I see unacceptable > >>>>>>> latency spikes. > >>>>>>> > >>>>>>> Matt > >>>>>>> > >>>>>>> ________________________________ > >>>>>>> _______________________________________________ > >>>>>>> hotspot-gc-use mailing list > >>>>>>> hotspot-gc-use at openjdk.java.net > > >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >>>>>> > >>>>>> _______________________________________________ > >>>>>> hotspot-gc-use mailing list > >>>>>> hotspot-gc-use at openjdk.java.net > > >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > >> _______________________________________________ > >> hotspot-gc-use mailing list > >> hotspot-gc-use at openjdk.java.net > >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > From peter.schuller at infidyne.com Sat May 15 10:12:20 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sat, 15 May 2010 19:12:20 +0200 Subject: g1 not doing partial aggressively enough -> fallback to full gc In-Reply-To: References: Message-ID: > ? ?HTTPGCTEST_LOGGC=gc.log HTTPGCTEST_COLLECTOR=g1 ./run.sh I forgot to mention that JAVA_HOME must be set or the script will fail without a user-friendly error. -- / Peter Schuller From peter.schuller at infidyne.com Sat May 15 10:10:52 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sat, 15 May 2010 19:10:52 +0200 Subject: g1 not doing partial aggressively enough -> fallback to full gc Message-ID: Hello, I have another utterly unscientific (but I think interesting) variation of an older test. The behavior that is interesting in this case is that the heap grows until it hits the maximum heap size and falls back to a full GC in spite of there being high-payoff old regions. Based on on the heap size after the full GC, the live working set if roughly ~ 230 MB, and the maximum heap size is 4 GB. This means that during the GC:s right before the full GC:s, g1 is choosing to do young generation GC:s even though the average live ratio in the older regions should be roughly 5% (low-hanging fruit for partial collections). Links to the test, the GC log file and an executable .jar file for convenience, follows at the bottom of this E-Mail. A short description of roughly what the test is doing in my particular invocation and use of it: I have a loop running which repeatedly tells the httpgctest server to add 25000 "data items" to its in-memory set, followed by removing 0.1% of all data items (pseudo-randomly). The end-result is that of a steady-state of roughly 250000 data items that are aged pseudo-randomly (the removal of 0.1% is done by selecting a pseudo-randomly from the set). Thus, dead objects will be accumulated over time over all regions, though the data structure overhead of the set itself will tend to generate data that is shorter-lived on average (due to use of clojure's immutable data structures). Given this behavior, no individual region is likely to become completely empty, so they need to be evacuated in a partial collection (rather than as part of the 'cleanup' phase). The bulk of the data collected in the young generation is expected to be the immutable data structure's internal structure. There should be very little writing to older generations (again due to the use of clojure's immutable data structures). The JVM version is one built from a recently merged-from-main bsd-port: changeset: 206:12f0c051d819 tag: tip parent: 202:44ad17a6ffea parent: 205:b7b4797303cb user: Greg Lewis date: Sat May 08 10:53:55 2010 -0700 summary: Merge from main OpenJDK repository The JVM options used in the particular run that produced the log is (cut'n'paste from -XX:+PrintCommandLineFlags, but re-ordered for clarity): -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:GCPauseIntervalMillis=15 -XX:MaxGCPauseMillis=10 -XX:+G1ParallelRSetScanningEnabled -XX:+G1ParallelRSetUpdatingEnabled -XX:InitialHeapSize=52428800 -XX:MaxHeapSize=4294967296 -XX:ThreadStackSize=256 -XX:+PrintCommandLineFlags -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+TraceClassUnloading -XX:+CITime The log files should lots of young collections, and a very select few partial collections after marking phases. Typically along the lines of (excerpt, because the full log is very long): 46.219: [GC pause (young) 2553M->2550M(4096M), 0.0071520 secs] 46.234: [GC pause (young) 2555M->2551M(4096M), 0.0052560 secs] 46.250: [GC pause (young) 2559M->2554M(4096M), 0.0074780 secs] 46.281: [GC pause (young) 2559M->2555M(4096M), 0.0058670 secs] 46.306: [GC pause (young) 2569M->2555M(4096M), 0.0039900 secs] 46.326: [GC pause (young) 2569M->2556M(4096M), 0.0056980 secs] 46.339: [GC concurrent-count-end, 0.4546580] 46.339: [GC cleanup 2562M->2502M(4096M), 0.0702940 secs] 46.410: [GC concurrent-cleanup-start] 46.414: [GC concurrent-cleanup-end, 0.0041480] 46.431: [GC pause (young) 2515M->2497M(4096M), 0.0069320 secs] 46.444: [GC pause (partial) 2501M->2495M(4096M), 0.0056890 secs] 46.469: [GC pause (partial) 2499M->2493M(4096M), 0.0065570 secs] 46.486: [GC pause (partial) 2497M->2493M(4096M), 0.0058280 secs] 46.497: [GC pause (partial) 2496M->2493M(4096M), 0.0108240 secs] 46.525: [GC pause (young) (initial-mark) 2507M->2494M(4096M)46.529: [GC concurrent-mark-start] , 0.0044130 secs] 46.544: [GC pause (young) 2508M->2494M(4096M), 0.0053780 secs] 46.574: [GC pause (young) 2513M->2495M(4096M), 0.0058290 secs] 46.727: [GC pause (young) 2512M->2499M(4096M), 0.0122760 secs] 46.761: [GC pause (young) 2507M->2501M(4096M), 0.0121180 secs] 46.799: [GC pause (young) 2512M->2505M(4096M), 0.0116450 secs] 46.826: [GC pause (young) 2513M->2507M(4096M), 0.0098290 secs] 46.852: [GC pause (young) 2515M->2510M(4096M), 0.0111450 secs] 46.873: [GC pause (young) 2517M->2512M(4096M), 0.0095310 secs] 46.893: [GC pause (young) 2519M->2514M(4096M), 0.0113990 secs] 46.918: [GC pause (young) 2519M->2516M(4096M), 0.0092540 secs] At the very end we see the full GC and the resulting heap size: 74.777: [GC pause (young) 4090M->4074M(4096M), 0.0085890 secs] 74.796: [GC pause (young) 4082M->4075M(4096M), 0.0061880 secs] 74.833: [Full GC 4095M->227M(758M), 2.9635480 secs] 77.940: [GC pause (young) 253M->232M(758M), 0.0206140 secs] 77.970: [GC pause (young) 236M->233M(1426M), 0.0168640 secs] Close to this there are only young collections in sight. The effect is lessened by providing less strict pause time demands on g1 (I tested 250/300 instead of the 10/15 used in this case), but the behavior *does* remain. It just takes longer to kick in (given the same heap size). Test links/reproduction information: The test is the httpgctest (as of version 751d8374810a497cf26e48211183db5dd0a73185): http://github.com/scode/httpgctest A GC log produced with: HTTPGCTEST_LOGGC=gc.log HTTPGCTEST_COLLECTOR=g1 ./run.sh Can be found here: http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100515/gc.log An executable .jar file (product of 'lein uberjar') is here: http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100515/httpgctest-standalone.jar For running the executable .jar with the same options, the direct link to run.sh of the correct version is: http://github.com/scode/httpgctest/blob/751d8374810a497cf26e48211183db5dd0a73185/run.sh The input to the test once running, is the following little loop running concurrently: while [ 1 ] ; do curl 'http://localhost:9191/gendata?amount=25000' ; curl 'http://localhost:9191/dropdata?ratio=0.1' ; sleep 0.1 ; done -- / Peter Schuller From adamh at basis.com Tue May 18 11:19:51 2010 From: adamh at basis.com (Adam Hawthorne) Date: Tue, 18 May 2010 14:19:51 -0400 Subject: PrintGCStats In-Reply-To: References: Message-ID: I hacked this one up a few months ago when I couldn't find one. You might want to review it, I don't guarantee it's accurate, and I don't know awk very well, but it seemed to be working when I last used it. Let me know if it doesn't come through (PrintGCStats.tgz) . Adam -- Adam Hawthorne Software Engineer BASIS International Ltd. www.basis.com +1.505.345.5232 Phone On Tue, May 18, 2010 at 14:12, Hiroshi Yamauchi wrote: > Hi, > > Does any have a version of the PrintGCStats script that works with the > recent Hotspot builds and that we can share in the community? > > It appears that an old version of it is available here: > > > http://java.sun.com/developer/technicalArticles/Programming/turbo/#PrintGCStats > > But it does not seem to be able to parse the output from a recent > Hotspot correctly (eg gc times always show zero.) > > I think it's very convenient and almost a must to have a version that > we can share and standardize on. > > Thanks, > Hiroshi > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100518/9cb78ee3/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: PrintGCStats.tgz Type: application/x-gzip Size: 13188 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100518/9cb78ee3/attachment.bin From yamauchi at google.com Tue May 18 12:07:15 2010 From: yamauchi at google.com (Hiroshi Yamauchi) Date: Tue, 18 May 2010 12:07:15 -0700 Subject: PrintGCStats In-Reply-To: References: Message-ID: Hi Adam, Thanks for a quick response. I'll try to take a look at it at my next chance. Does anyone, who knows more about the script than I, feel like taking a look at it? Thanks, Hiroshi On Tue, May 18, 2010 at 11:19 AM, Adam Hawthorne wrote: > I hacked this one up a few months ago when I couldn't find one. ?You might > want to review it, I don't guarantee it's accurate, and I don't know awk > very well, but it seemed to be working when I last used it. > > Let me know if it doesn't come through (PrintGCStats.tgz) . > Adam > > -- > Adam Hawthorne > Software Engineer > BASIS International Ltd. > www.basis.com > +1.505.345.5232 Phone > > > On Tue, May 18, 2010 at 14:12, Hiroshi Yamauchi wrote: >> >> Hi, >> >> Does any have a version of the PrintGCStats script that works with the >> recent Hotspot builds and that we can share in the community? >> >> It appears that an old version of it is available here: >> >> >> ?http://java.sun.com/developer/technicalArticles/Programming/turbo/#PrintGCStats >> >> But it does not seem to be able to parse the output from a recent >> Hotspot correctly (eg gc times always show zero.) >> >> I think it's very convenient and almost a must to have a version that >> we can share and standardize on. >> >> Thanks, >> Hiroshi > >