Troubleshooting a ~40-second minor collection

Mon Dec 2 14:14:36 PST 2013

Sorry - the 4.5M attached .tgz for GC logs may have blocked this message. I
am re-sending without the logs - if anybody is interested in reviewing them
please let me know and I can send them directly.
Also, I should have mentioned - this is 64bit JDK 7u25 running on Centos
6.4.

Thanks again,
      Aaron

On Mon, Dec 2, 2013 at 5:11 PM, Aaron Daubman <daubman at gmail.com> wrote:

> Hi again GC geniuses,
>
> I have been regularly seeing this long (10s of seconds) minor collection
> time now, and this time I have plenty of logs...
>
> Last time, the only response I received was that it was likely related to
> some part of memory that GC tried to inspect having been swapped out.
> That seemed to make sense, and so I set about configuring HugePages on
> these linux servers to allow JVM heap to be locked as active mem (we have
> plenty of ram on these 64G servers).
>
> That seems to be working correctly:
> grep Huge /proc/meminfo
> AnonHugePages:         0 kB
> HugePages_Total:   10240
> HugePages_Free:     2464
> HugePages_Rsvd:     1293
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> These are the JVM flags I am using, LargePages to UseSHM is what has been
> added recently to try and prevent the heap from being swapped out:
> -XX:+UseG1GC
> -XX:MaxGCPauseMillis=50
> -Xms16G
> -Xmx16G
> *-XX:+UseLargePages*
> *-XX:LargePageSizeInBytes=2m*
> *-XX:+UseSHM*
> -XX:+AggressiveOpts
> -XX:+UseFastAccessorMethods
> -XX:+UseStringCache
> -XX:+OptimizeStringConcat
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintAdaptiveSizePolicy
>
> I am attaching a gzipped tar file of GC logs from four of the servers
> seeing these tens-of-seconds minor pauses. Any help in explaining what may
> be causing them would be much appreciated.
>
> The situation in which this occurs is when we copy at high rate data to
> the server using netcat, tar and gzip (I guess it is a known issue that tar
> will pollute pagecache*, but using largepages for the JVM should protect me
> from swapping due to cache pollution, no?)
>
> Thanks as always!
>       Aaron
>
> *
> http://www.mysqlperformanceblog.com/2010/04/02/fadvise-may-be-not-what-you-expect/
>
>
> On Mon, Oct 14, 2013 at 9:36 PM, Aaron Daubman <daubman at gmail.com> wrote:
>
>> Hi All,
>>
>> I have unfortunately lost my GC log file (server restarted shortly after
>> the event) but have AppDynamics stats.
>>
>> I am running Solr 3.6.1 in a jetty 9 container under jdk 1.7u25.
>>
>> Max heap is 16G. <7G were used at the time of the event. Typical heap
>> usage is ~28%.
>>
>> There were around 10 minor collection events (fairly typical) during the
>> minute the event occurred. The event was an almost 40-second max minor
>> collection time.
>>
>> Around that time JVM Heap utilization was only between 27%-31% utilized -
>> I cannot remember the last time we had a major collection, and I have also
>> never seen such a long minor collection time. There is nothing I can see
>> about traffic to the JVM that appeared abnormal.
>>
>> I did see some long external (JDBC) query times about this time as well,
>> but thought they were more likely a symptom of the minor collection pause,
>> rather than a cause.
>>
>> AppDynamics monitors Code Cache,  G1 Eden Space, G1 Old Gen, G1 Perm Gen,
>> and G1 Survivor - the max of which at the time was Perm Gen at only 60%.
>>
>> Is there anything else I can do (without the GC log file) to try and
>> determine the cause of the unexpected 40s minor collection pause time?
>>
>> Thanks,
>>      Aaron
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131202/d5469c21/attachment.html