From taras.tielkes at gmail.com  Sun Jan 13 12:07:13 2013
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 13 Jan 2013 21:07:13 +0100
Subject: Monitoring finalization activity
Message-ID: <CA+R7V7_qWquNxCLnHRh50XPmRrKNsMy14+4M_AKX4eogvtEQwA@mail.gmail.com>

Hi,

Are there some (semi-) public counters available to track how much work is
being performed with regards to finalization? I'm mainly interested in
finalized instance counts by class, rather than the current size of the
finalizer queue.

Thanks,
-tt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130113/6fb0bc51/attachment.html 

From Andreas.Loew at oracle.com  Sun Jan 13 14:17:56 2013
From: Andreas.Loew at oracle.com (Andreas Loew)
Date: Sun, 13 Jan 2013 23:17:56 +0100
Subject: Monitoring finalization activity
In-Reply-To: <CA+R7V7_qWquNxCLnHRh50XPmRrKNsMy14+4M_AKX4eogvtEQwA@mail.gmail.com>
References: <CA+R7V7_qWquNxCLnHRh50XPmRrKNsMy14+4M_AKX4eogvtEQwA@mail.gmail.com>
Message-ID: <50F33294.1050308@oracle.com>

Hi Taras,

you should be able to use BTrace (i.e. dynamic bytecode instrumentation) 
and register a probe on calling into the finalize() methods of the 
classes you want to monitor (the probe can then do the counting):

http://kenai.com/projects/btrace/pages/Home
http://kenai.com/projects/btrace/pages/UserGuide

Hope this helps & best regards,
Andreas


Am 13.01.2013 21:07, schrieb Taras Tielkes:
> Hi,
>
> Are there some (semi-) public counters available to track how much 
> work is being performed with regards to finalization? I'm mainly 
> interested in finalized instance counts by class, rather than the 
> current size of the finalizer queue.
>
> Thanks,
> -tt

-- 
Andreas Loew | Senior Java Architect
ACS Principal Service Delivery Engineer
ORACLE Germany


From michal.warecki at gmail.com  Mon Jan 14 09:21:41 2013
From: michal.warecki at gmail.com (=?ISO-8859-2?Q?Micha=B3_Warecki?=)
Date: Mon, 14 Jan 2013 18:21:41 +0100
Subject: CMS lazy sweeping
Message-ID: <CAJ_mLLpDZTAUUbgV0f+8bMvzOchh+m7Wo3U0mj8Oz5hw9hWOXQ@mail.gmail.com>

Hi All,

I'm trying to understand the mark-sweep algorithm. Before I dive into
implementation of CMS in OpenJDK I want to ask a few questions.
Does CMS in OpenJDK use lazy sweeping with a block structured heap? If no,
why? If yes, each block should contain information about class name of
objects allocated in each block.
Furthermore, object header does not have to contain class name because it
is in block header. I think this solution will improve number of cache hits
and heap size.
Are there any information about this?

Thanks,
MW
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130114/dc6c9de3/attachment.html 

From bartosz.markocki at gmail.com  Fri Jan 18 05:10:30 2013
From: bartosz.markocki at gmail.com (Bartek Markocki)
Date: Fri, 18 Jan 2013 14:10:30 +0100
Subject: Spikes in the duration of the ParNew collections
Message-ID: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>

Hello all,

During tests of a new version of our application we found out that
some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38,
64bit, -server with CMS.

Of course the first thing that came to our mind was a spike in
allocation rate resulting in a spike in the amount of surviving
objects and/ or a spike in promotion rate. Unfortunately the
collection(s) in question did not showed any abnormality in this
matter. To make the things even more interesting, showed in the
attached extract from the gc log, some of those long lasting ParNew
showed smaller promotion rate comparing to the average (21k per
collection).

Before re-run of the test we enabled  -XX:+PrintSafepointStatistics
and -XX:+TraceSafepointCleanupTime  to get better understanding of STW
times. As a result of that we found out that almost all the time goes
to the collection time.
28253.076: GenCollectForAllocation          [     382          0
       0    ]      [     0     0     0     3   170    ]  0

Additionally we noticed that user to real time ratio for a normal
(normally long) ParNew collection is between 4 and 8. For the
collection in question it jumps to 12 (we have 16 cores) - so not only
the collection lasted longer but more CPU was used.

For your review - I attached an extract from std out and gc log for
the collection in question.

Additionally we reran the test with the changed collector to
ParallelOld and we did not notice comparable spikes in the young
generation times.
After that we took Java7 update 10 with the CMS and found out that the
issue is still there (spikes in ParNew times) however is less
noticeable, i.e., the max ParNew time was  113ms.

The question of the day is: why it is happening? what else we can
do/check/test to make our application run CMS on java6?

Thanks in advance,
Bartek


$ java -version
java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)

$less /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)

JVM options
-server -Xms2g -Xmx2g  -XX:PermSize=64m -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50
-XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m
-XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal
-Xloggc:/apps/gcLog.log  -XX:+PrintGCDateStamps
-XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3
-XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces
-XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime
-XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime
-XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcLog.log.gz
Type: application/x-gzip
Size: 1402 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130118/4c612d80/gcLog.log.gz 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stdout.log.gz
Type: application/x-gzip
Size: 829 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130118/4c612d80/stdout.log.gz 

From aaisinzon at guidewire.com  Wed Jan 23 15:18:28 2013
From: aaisinzon at guidewire.com (Alexandre Aisinzon)
Date: Wed, 23 Jan 2013 23:18:28 +0000
Subject: JE caching
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com>

Hi all

https://forums.oracle.com/forums/thread.jspa?messageID=10017916 indicates that one should explicitly add the Compressed references parameter because: "JE cache sizing does not take into account compressed oops unless it is explicitly specified using XX:+UseCompressedOops".
I am not clear what JE caching is. Can someone elaborate on what this capability is?

Best

Alex A

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130123/99a9d44c/attachment.html 

From bernd.eckenfels at googlemail.com  Wed Jan 23 15:43:45 2013
From: bernd.eckenfels at googlemail.com (Bernd Eckenfels)
Date: Thu, 24 Jan 2013 00:43:45 +0100
Subject: JE caching
In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com>
Message-ID: <op.wrd1i7vxtqmg3o@eckenfels02.seeburger.de>

Am 24.01.2013, 00:18 Uhr, schrieb Alexandre Aisinzon  
<aaisinzon at guidewire.com>:

> I am not clear what JE caching is. Can someone elaborate on what this  
> capability is?

It sounds like JE = BDB Java Edition (Forum Topic), so it seems to be the  
DB Cache?

It is documented in the API:  
http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentMutableConfig.html#setCachePercent


Gruss
Bernd
-- 
https://plus.google.com/u/1/108084227682171831683/about

From michal.warecki at gmail.com  Wed Jan 23 15:58:22 2013
From: michal.warecki at gmail.com (=?ISO-8859-2?Q?Micha=B3_Warecki?=)
Date: Thu, 24 Jan 2013 00:58:22 +0100
Subject: No subject
Message-ID: <CAJ_mLLqdaOR-PxKc9D4Q+dB_kc6iSoY6Q7jazGY=y2N_ZC3dZg@mail.gmail.com>

http://sungersilikon.com/yahool221.php

From ryebrye at gmail.com  Thu Jan 24 16:29:22 2013
From: ryebrye at gmail.com (Ryan Gardner)
Date: Thu, 24 Jan 2013 19:29:22 -0500
Subject: Any suggestions for number of cores / heap size for g1 or cms?
In-Reply-To: <CAEAKNo_qNOnGvJksOd3V+MzSU-OjqkLzpwuTVHQa8JVQBw68Kg@mail.gmail.com>
References: <CAEAKNo9n9iDTY_wZ8h=Dv1QDmQEM8krxe0aWJDnTnHmD_0deUg@mail.gmail.com>
	<CAEAKNo_qNOnGvJksOd3V+MzSU-OjqkLzpwuTVHQa8JVQBw68Kg@mail.gmail.com>
Message-ID: <CAEAKNo_Ea2JZ7gX98U5izV=eiDjMPhxOKR9Ps9wgEK+0mK4kYg@mail.gmail.com>

We have a non-cpu intensive app that is licensed based on cpu cores.

We want to use as much heap as is reasonable without having large pauses.

I've only deployed g1 on machines with lots of cores - how well does it
work on fewer cores? If we had a live set of 32gb and a heap of 72gb with a
relatively small object allocation rate would g1 have low pause times with
4 or 8 physical cores?

It doesn't matter too much how long the concurrent phase is - just the
pause parts that matter.

Any tips/suggestions?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130124/2b70a09d/attachment.html 

From jon.masamitsu at oracle.com  Thu Jan 24 21:13:03 2013
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 24 Jan 2013 21:13:03 -0800
Subject: Spikes in the duration of the ParNew collections
In-Reply-To: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>
References: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>
Message-ID: <5102145F.8020200@oracle.com>

If you have a test setup where you can run some experiments,
try -XX:-ParNewGC.  There have been instances in the past
where flaws in the partitioning for parallelism has caused some
dramatic increases in the ParNew times.  This setting will use
the serial young generation collector.  It will be slow but perhaps
not have the spiking.

If that removes the spiking, it gives us some information about
the cause but probably not enough to pinpoint the problem.
If I were attacking this I'd try to profile the VM to see which
methods are consuming all that time.

Jon

On 1/18/2013 5:10 AM, Bartek Markocki wrote:
> Hello all,
>
> During tests of a new version of our application we found out that
> some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38,
> 64bit, -server with CMS.
>
> Of course the first thing that came to our mind was a spike in
> allocation rate resulting in a spike in the amount of surviving
> objects and/ or a spike in promotion rate. Unfortunately the
> collection(s) in question did not showed any abnormality in this
> matter. To make the things even more interesting, showed in the
> attached extract from the gc log, some of those long lasting ParNew
> showed smaller promotion rate comparing to the average (21k per
> collection).
>
> Before re-run of the test we enabled  -XX:+PrintSafepointStatistics
> and -XX:+TraceSafepointCleanupTime  to get better understanding of STW
> times. As a result of that we found out that almost all the time goes
> to the collection time.
> 28253.076: GenCollectForAllocation          [     382          0
>         0    ]      [     0     0     0     3   170    ]  0
>
> Additionally we noticed that user to real time ratio for a normal
> (normally long) ParNew collection is between 4 and 8. For the
> collection in question it jumps to 12 (we have 16 cores) - so not only
> the collection lasted longer but more CPU was used.
>
> For your review - I attached an extract from std out and gc log for
> the collection in question.
>
> Additionally we reran the test with the changed collector to
> ParallelOld and we did not notice comparable spikes in the young
> generation times.
> After that we took Java7 update 10 with the CMS and found out that the
> issue is still there (spikes in ParNew times) however is less
> noticeable, i.e., the max ParNew time was  113ms.
>
> The question of the day is: why it is happening? what else we can
> do/check/test to make our application run CMS on java6?
>
> Thanks in advance,
> Bartek
>
>
> $ java -version
> java version "1.6.0_38"
> Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
>
> $less /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>
> JVM options
> -server -Xms2g -Xmx2g  -XX:PermSize=64m -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50
> -XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m
> -XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal
> -Xloggc:/apps/gcLog.log  -XX:+PrintGCDateStamps
> -XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3
> -XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy
> -XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces
> -XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime
> -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics
> -XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime
> -XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130124/2950f7a7/attachment.html 

From dhd at exnet.com  Thu Jan 24 23:48:27 2013
From: dhd at exnet.com (Damon Hart-Davis)
Date: Fri, 25 Jan 2013 07:48:27 +0000
Subject: Any suggestions for number of cores / heap size for g1 or cms?
In-Reply-To: <CAEAKNo_Ea2JZ7gX98U5izV=eiDjMPhxOKR9Ps9wgEK+0mK4kYg@mail.gmail.com>
References: <CAEAKNo9n9iDTY_wZ8h=Dv1QDmQEM8krxe0aWJDnTnHmD_0deUg@mail.gmail.com>
	<CAEAKNo_qNOnGvJksOd3V+MzSU-OjqkLzpwuTVHQa8JVQBw68Kg@mail.gmail.com>
	<CAEAKNo_Ea2JZ7gX98U5izV=eiDjMPhxOKR9Ps9wgEK+0mK4kYg@mail.gmail.com>
Message-ID: <3563F6F7-5268-4AAE-870F-B8391E7B867A@exnet.com>

Indeed how well does it do on a single core?

Rgds

Damon


On 25 Jan 2013, at 00:29, Ryan Gardner wrote:

> We have a non-cpu intensive app that is licensed based on cpu cores.
> 
> We want to use as much heap as is reasonable without having large pauses.
> 
> I've only deployed g1 on machines with lots of cores - how well does it work on fewer cores? If we had a live set of 32gb and a heap of 72gb with a relatively small object allocation rate would g1 have low pause times with 4 or 8 physical cores?
> 
> It doesn't matter too much how long the concurrent phase is - just the pause parts that matter.
> 
> Any tips/suggestions?
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From bartosz.markocki at gmail.com  Fri Jan 25 08:51:31 2013
From: bartosz.markocki at gmail.com (Bartek Markocki)
Date: Fri, 25 Jan 2013 17:51:31 +0100
Subject: Spikes in the duration of the ParNew collections
In-Reply-To: <5102145F.8020200@oracle.com>
References: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>
	<5102145F.8020200@oracle.com>
Message-ID: <CAGBMu_Z59ZodxM-zLYhQUEQuunc2Ym-i9DpP4gWxmfJO7GqLcw@mail.gmail.com>

Yes, we do have test setup however it takes about 24h to observe this behavior.

For the moment the team decided to use ParallelOld collector and run
10 day long test to observe the application behavior in the long run.
Right now we are 4 days into the test. After we finish we will try to
rerun the test with -UseParNew and let you know.

This way or another I reviewed the current test gc logs (from 6
identical instances) and in one of them found the following:
{Heap before GC invocations=42414 (full 12):
 PSYoungGen      total 1738432K, used 1738032K [0x0000000795c00000,
0x0000000800000000, 0x0000000800000000)
  eden space 1736064K, 100% used
[0x0000000795c00000,0x00000007ffb60000,0x00000007ffb60000)
  from space 2368K, 83% used
[0x00000007ffb60000,0x00000007ffd4c208,0x00000007ffdb0000)
  to   space 2304K, 0% used
[0x00000007ffdc0000,0x00000007ffdc0000,0x0000000800000000)
 ParOldGen       total 356352K, used 282406K [0x0000000780000000,
0x0000000795c00000, 0x0000000795c00000)
  object space 356352K, 79% used
[0x0000000780000000,0x00000007913c9b58,0x0000000795c00000)
 PSPermGen       total 65792K, used 34884K [0x000000077ae00000,
0x000000077ee40000, 0x0000000780000000)
  object space 65792K, 53% used
[0x000000077ae00000,0x000000077d011390,0x000000077ee40000)
2013-01-24T16:58:33.374-0600: 289746.531:
[GCAdaptiveSizePolicy::compute_survivor_space_size_and_thresh:
survived: 1803504  promoted: 106496  overflow: falseAdaptiveSizeSt
art: 289746.646 collection: 42414
  avg_survived_padded_avg: 2347456.750000  avg_promoted_padded_avg:
187448.734375  avg_pretenured_padded_avg: 3911.287598
tenuring_thresh: 1  target_size: 2359296
Desired survivor size 2359296 bytes, new threshold 1 (max 15)
PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time:
0.009063 major_cost: 0.000009 mutator_cost: 0.990928 throughput_goal:
0.990000 live_space: 319016672
free_space: 2039283712 old_promo_size: 336265216 old_eden_size:
1703018496 desired_promo_size: 336265216 desired_eden_size: 1703018496
AdaptiveSizePolicy::survivor space sizes: collection: 42414 (2359296,
2424832) -> (2359296, 2359296)
AdaptiveSizeStop: collection: 42414
 [PSYoungGen: 1738032K->1761K(1738496K)] 2020439K->284272K(2094848K),
0.1150200 secs] [Times: user=1.25 sys=0.00, real=0.11 secs]
Heap after GC invocations=42414 (full 12):
 PSYoungGen      total 1738496K, used 1761K [0x0000000795c00000,
0x0000000800000000, 0x0000000800000000)
  eden space 1736192K, 0% used
[0x0000000795c00000,0x0000000795c00000,0x00000007ffb80000)
  from space 2304K, 76% used
[0x00000007ffdc0000,0x00000007fff784f0,0x0000000800000000)
  to   space 2304K, 0% used
[0x00000007ffb80000,0x00000007ffb80000,0x00000007ffdc0000)
 ParOldGen       total 356352K, used 282510K [0x0000000780000000,
0x0000000795c00000, 0x0000000795c00000)
  object space 356352K, 79% used
[0x0000000780000000,0x00000007913e3b58,0x0000000795c00000)
 PSPermGen       total 65792K, used 34884K [0x000000077ae00000,
0x000000077ee40000, 0x0000000780000000)
  object space 65792K, 53% used
[0x000000077ae00000,0x000000077d011390,0x000000077ee40000)
}

Again the allocation rate, the amount of survived, promoted objects
are comparable to other scavenges however this time it took 115 ms to
perform the collection whereas the average from the others is 11ms.
Unfortunately we were not able to have -XX:+PrintGCTaskTimeStamps
enabled (to have a better visibility into what took so long) as this
caused the JVM to crash (constantly). I already reported this as a bug
2426776 but it is still internally reviewed by Oracle.

There are two additional things:
1. While preparing to send the first email I found this post
http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html
where Ramki said 'There was
an old issue wrt monitor deflation that was foixed a few releases
ago'. As we are on the latest update (38) I am not expecting this to
apply here but do you know the bug id for this, so I can with full
confidence eliminate this?
2. While monitoring the running application I noticed that we
constantly have about 2.5k objects that will need to be finalized.
Those objects are marked as eligible for finalization in buckets (min:
600, max 2000 objects). The objects are instances of
java.util.zip.Deflater class and are finalized quite quickly (below 2
sec - the refresh interval for my monitoring tool). Do you think it
might have something in common? This observation I did recently (so
for the ParallelOld collector) so for the moment I am not able to
correlate this with high ParNew times.

Thank you for looking at our problem.
Bartek

On Fri, Jan 25, 2013 at 6:13 AM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> If you have a test setup where you can run some experiments,
> try -XX:-ParNewGC.  There have been instances in the past
> where flaws in the partitioning for parallelism has caused some
> dramatic increases in the ParNew times.  This setting will use
> the serial young generation collector.  It will be slow but perhaps
> not have the spiking.
>
> If that removes the spiking, it gives us some information about
> the cause but probably not enough to pinpoint the problem.
> If I were attacking this I'd try to profile the VM to see which
> methods are consuming all that time.
>
> Jon
>
>
> On 1/18/2013 5:10 AM, Bartek Markocki wrote:
>
> Hello all,
>
> During tests of a new version of our application we found out that
> some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38,
> 64bit, -server with CMS.
>
> Of course the first thing that came to our mind was a spike in
> allocation rate resulting in a spike in the amount of surviving
> objects and/ or a spike in promotion rate. Unfortunately the
> collection(s) in question did not showed any abnormality in this
> matter. To make the things even more interesting, showed in the
> attached extract from the gc log, some of those long lasting ParNew
> showed smaller promotion rate comparing to the average (21k per
> collection).
>
> Before re-run of the test we enabled  -XX:+PrintSafepointStatistics
> and -XX:+TraceSafepointCleanupTime  to get better understanding of STW
> times. As a result of that we found out that almost all the time goes
> to the collection time.
> 28253.076: GenCollectForAllocation          [     382          0
>        0    ]      [     0     0     0     3   170    ]  0
>
> Additionally we noticed that user to real time ratio for a normal
> (normally long) ParNew collection is between 4 and 8. For the
> collection in question it jumps to 12 (we have 16 cores) - so not only
> the collection lasted longer but more CPU was used.
>
> For your review - I attached an extract from std out and gc log for
> the collection in question.
>
> Additionally we reran the test with the changed collector to
> ParallelOld and we did not notice comparable spikes in the young
> generation times.
> After that we took Java7 update 10 with the CMS and found out that the
> issue is still there (spikes in ParNew times) however is less
> noticeable, i.e., the max ParNew time was  113ms.
>
> The question of the day is: why it is happening? what else we can
> do/check/test to make our application run CMS on java6?
>
> Thanks in advance,
> Bartek
>
>
> $ java -version
> java version "1.6.0_38"
> Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
>
> $less /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.5 (Tikanga)
>
> JVM options
> -server -Xms2g -Xmx2g  -XX:PermSize=64m -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50
> -XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m
> -XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal
> -Xloggc:/apps/gcLog.log  -XX:+PrintGCDateStamps
> -XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3
> -XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy
> -XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces
> -XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime
> -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics
> -XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime
> -XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From ysr1729 at gmail.com  Fri Jan 25 12:22:06 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 25 Jan 2013 12:22:06 -0800
Subject: Spikes in the duration of the ParNew collections
In-Reply-To: <CAGBMu_Z59ZodxM-zLYhQUEQuunc2Ym-i9DpP4gWxmfJO7GqLcw@mail.gmail.com>
References: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>
	<5102145F.8020200@oracle.com>
	<CAGBMu_Z59ZodxM-zLYhQUEQuunc2Ym-i9DpP4gWxmfJO7GqLcw@mail.gmail.com>
Message-ID: <CABzyjynPLWt30mpdcJkHr7BKtaBAwUTuEWSvOcg2AGbG6ozQQQ@mail.gmail.com>

Hi Bartek --

On Fri, Jan 25, 2013 at 8:51 AM, Bartek Markocki <bartosz.markocki at gmail.com
> wrote:

>
> 1. While preparing to send the first email I found this post
>
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html
> where Ramki said 'There was
> an old issue wrt monitor deflation that was foixed a few releases
> ago'. As we are on the latest update (38) I am not expecting this to
> apply here but do you know the bug id for this, so I can with full
> confidence eliminate this?
>

That bug involved time outside of the actual collection time, during which
the threads were paused. If I understand yr problem correctly, it's that
the collection times themselves are spikey. If that understanding is
correct, then yr problem would not be related to the above email.


> 2. While monitoring the running application I noticed that we
> constantly have about 2.5k objects that will need to be finalized.
> Those objects are marked as eligible for finalization in buckets (min:
> 600, max 2000 objects). The objects are instances of
> java.util.zip.Deflater class and are finalized quite quickly (below 2
> sec - the refresh interval for my monitoring tool). Do you think it
> might have something in common? This observation I did recently (so
> for the ParallelOld collector) so for the moment I am not able to
> correlate this with high ParNew times.
>


Yes, I'd look to see if "PrintReferenceGC" times indicate any diffs, if you
haven't already done so. (Haven't followed the preceding part of the thread
very closely though.) If/ehrn you find that the spikiness is specific to
CMS+ParNew, one can do a few more experiments. Bear in mind that promotion
policies are slightly different between the two, and ParNew isn't as
adaptive as is ParallelOld (in terms of resizing/reshaping the heap). If
the expts are under controlled conditions and you see spikes with a steady
workload, one might be able to more quickly pinpoint the culprit(s). (For
the case of CMS+ParNew, for example, the dynamic sizing of local promotion
buffer lists would be one place to shine a light on.)

-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130125/4c99e219/attachment.html 

From taras.tielkes at gmail.com  Sat Jan 26 06:51:39 2013
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sat, 26 Jan 2013 15:51:39 +0100
Subject: Monitoring finalization activity
In-Reply-To: <50F33294.1050308@oracle.com>
References: <CA+R7V7_qWquNxCLnHRh50XPmRrKNsMy14+4M_AKX4eogvtEQwA@mail.gmail.com>
	<50F33294.1050308@oracle.com>
Message-ID: <CA+R7V7_wir9QpvOMCpTvxR_CZ6GHiR124E1_BUZuLgwXjAtE0w@mail.gmail.com>

Hi,

We've enabled -XX:+PrintReferenceGC, which at least gives totals per
reference type cleared per GC.
It seems that the tracing JVM options to see which classes are actually put
on the finalizer queue are not available in product JVM builds.

Is it possible to get the same data somehow through the JMX GC APIs
(com.sun.management.GcInfo etc)?

Thanks,
-tt


On Sun, Jan 13, 2013 at 11:17 PM, Andreas Loew <Andreas.Loew at oracle.com>wrote:

> Hi Taras,
>
> you should be able to use BTrace (i.e. dynamic bytecode instrumentation)
> and register a probe on calling into the finalize() methods of the classes
> you want to monitor (the probe can then do the counting):
>
> http://kenai.com/projects/**btrace/pages/Home<http://kenai.com/projects/btrace/pages/Home>
> http://kenai.com/projects/**btrace/pages/UserGuide<http://kenai.com/projects/btrace/pages/UserGuide>
>
> Hope this helps & best regards,
> Andreas
>
>
> Am 13.01.2013 21:07, schrieb Taras Tielkes:
>
>  Hi,
>>
>> Are there some (semi-) public counters available to track how much work
>> is being performed with regards to finalization? I'm mainly interested in
>> finalized instance counts by class, rather than the current size of the
>> finalizer queue.
>>
>> Thanks,
>> -tt
>>
>
> --
> Andreas Loew | Senior Java Architect
> ACS Principal Service Delivery Engineer
> ORACLE Germany
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130126/3b37c90a/attachment.html 

From pasthelod at gmail.com  Sun Jan 27 00:34:48 2013
From: pasthelod at gmail.com (Pas)
Date: Sun, 27 Jan 2013 09:34:48 +0100
Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a
	stop-the-world System.gc() alleviates
Message-ID: <CAF1H-TA0FET1rHQFpfLQV-bZM=uYk7hjus6Zsg+U04twmmvvhw@mail.gmail.com>

Hello,

Long story short, Minor GC times jump from ~30ms to more than a second (and
increase to about 5 seconds), and only an explicit paralell Full GC can
whack it out of this madness. Interestingly it looks like this bug/feature
manifests when a big ~100+ MB byte[] object gets allocated, thus triggering
a CMS initial-sweep. (The CMS runs fine though, but the young gen
collections take forever.)

http://pastebin.com/RcBkCEEE (of course, if someone's interested I here's
the full 50 MBs of the otherwise rather predictable log, 2.9MB compressed
http://zomg.hu/work/wtf-gc.log.xz )

We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon
E3-something with plenty of RAM for the heap, with the following options:

-Xmx5128M
(-Xms5128M, though the linked gclog is without this)
-XX:NewSize=300m
-XX:MaxNewSize=300m
-XX:PermSize=64m
-XX:MaxPermSize=192m

-XX:+UseParNewGC
-XX:ParallelGCThreads=2
-XX:MaxTenuringThreshold=4
-XX:SurvivorRatio=3

-XX:+UseConcMarkSweepGC
-XX:+UnlockDiagnosticVMOptions
-XX:+CMSScavengeBeforeRemark
-XX:CMSInitiatingOccupancyFraction=65

-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCDateStamps
-XX:PrintFLSStatistics=1
-Xloggc:/logs/gc.log
-verbose:gc

Has anyone experienced similar issues?

Thanks for your time,
Pas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130127/ff4a9a83/attachment.html 

From ysr1729 at gmail.com  Sun Jan 27 12:00:24 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Sun, 27 Jan 2013 12:00:24 -0800
Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a
	stop-the-world System.gc() alleviates
In-Reply-To: <CAF1H-TA0FET1rHQFpfLQV-bZM=uYk7hjus6Zsg+U04twmmvvhw@mail.gmail.com>
References: <CAF1H-TA0FET1rHQFpfLQV-bZM=uYk7hjus6Zsg+U04twmmvvhw@mail.gmail.com>
Message-ID: <CABzyjykhWafEoLncUKT_t3E3ic6E5JM-YXgdNp9FCj+SwbbwQA@mail.gmail.com>

Check to see if the pause time increase correlates to a jump in the
promotion volume per scavenge. Should be easy to get from yr gc logs (which
i haven't looked at).

-- ramki


On Sun, Jan 27, 2013 at 12:34 AM, Pas <pasthelod at gmail.com> wrote:

> Hello,
>
> Long story short, Minor GC times jump from ~30ms to more than a second
> (and increase to about 5 seconds), and only an explicit paralell Full GC
> can whack it out of this madness. Interestingly it looks like this
> bug/feature manifests when a big ~100+ MB byte[] object gets allocated,
> thus triggering a CMS initial-sweep. (The CMS runs fine though, but the
> young gen collections take forever.)
>
> http://pastebin.com/RcBkCEEE (of course, if someone's interested I here's
> the full 50 MBs of the otherwise rather predictable log, 2.9MB compressed
> http://zomg.hu/work/wtf-gc.log.xz )
>
> We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon
> E3-something with plenty of RAM for the heap, with the following options:
>
> -Xmx5128M
> (-Xms5128M, though the linked gclog is without this)
> -XX:NewSize=300m
> -XX:MaxNewSize=300m
> -XX:PermSize=64m
> -XX:MaxPermSize=192m
>
> -XX:+UseParNewGC
> -XX:ParallelGCThreads=2
> -XX:MaxTenuringThreshold=4
> -XX:SurvivorRatio=3
>
> -XX:+UseConcMarkSweepGC
> -XX:+UnlockDiagnosticVMOptions
> -XX:+CMSScavengeBeforeRemark
> -XX:CMSInitiatingOccupancyFraction=65
>
> -XX:+PrintGC
> -XX:+PrintGCDetails
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCDateStamps
> -XX:PrintFLSStatistics=1
> -Xloggc:/logs/gc.log
> -verbose:gc
>
> Has anyone experienced similar issues?
>
> Thanks for your time,
> Pas
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130127/35303d3a/attachment.html 

From bartosz.markocki at gmail.com  Mon Jan 28 08:27:03 2013
From: bartosz.markocki at gmail.com (Bartek Markocki)
Date: Mon, 28 Jan 2013 17:27:03 +0100
Subject: Spikes in the duration of the ParNew collections
In-Reply-To: <CABzyjynPLWt30mpdcJkHr7BKtaBAwUTuEWSvOcg2AGbG6ozQQQ@mail.gmail.com>
References: <CAGBMu_ZwK3ri-WRXS17+HLmw+YTSBjB-CTwZfdsSTKg=q8Ld0A@mail.gmail.com>
	<5102145F.8020200@oracle.com>
	<CAGBMu_Z59ZodxM-zLYhQUEQuunc2Ym-i9DpP4gWxmfJO7GqLcw@mail.gmail.com>
	<CABzyjynPLWt30mpdcJkHr7BKtaBAwUTuEWSvOcg2AGbG6ozQQQ@mail.gmail.com>
Message-ID: <CAGBMu_ZypRfA4-pSeY3_cGGcUicODaLjE+vqfAFaFRUdctE45g@mail.gmail.com>

Hi Ramki,

See my comment in-line:

On Fri, Jan 25, 2013 at 9:22 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
> Hi Bartek --
>
> On Fri, Jan 25, 2013 at 8:51 AM, Bartek Markocki
> <bartosz.markocki at gmail.com> wrote:
>>
>>
>> 1. While preparing to send the first email I found this post
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html
>> where Ramki said 'There was
>> an old issue wrt monitor deflation that was foixed a few releases
>> ago'. As we are on the latest update (38) I am not expecting this to
>> apply here but do you know the bug id for this, so I can with full
>> confidence eliminate this?
>
>
> That bug involved time outside of the actual collection time, during which
> the threads were paused. If I understand yr problem correctly, it's that the
> collection times themselves are spikey. If that understanding is correct,
> then yr problem would not be related to the above email.
You got it correctly; we talk about 'inside' collection time here.

>
>>
>> 2. While monitoring the running application I noticed that we
>> constantly have about 2.5k objects that will need to be finalized.
>> Those objects are marked as eligible for finalization in buckets (min:
>> 600, max 2000 objects). The objects are instances of
>> java.util.zip.Deflater class and are finalized quite quickly (below 2
>> sec - the refresh interval for my monitoring tool). Do you think it
>> might have something in common? This observation I did recently (so
>> for the ParallelOld collector) so for the moment I am not able to
>> correlate this with high ParNew times.
>
>
>
> Yes, I'd look to see if "PrintReferenceGC" times indicate any diffs,
Unfortunately no significant diffs - in terms of time as well as the
amount of refs :(
The only peculiar thing (at least to me) that I noticed around the
collection in question comes from FLSStatistics.
The binary tree before and after looks exactly the same:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 30637977
Max   Chunk Size: 30637977
Number of Blocks: 1
Av.  Block  Size: 30637977
Tree      Height: 1

However the indexed free list before and after shows a change:
Before:
Statistics for IndexedFreeLists:
--------------------------------
Total Free Space: 60314
Max   Chunk Size: 211
Number of Blocks: 1498
Av.  Block  Size: 40
 free=30698291 frag=0.0039

After:
--------------------------------
Total Free Space: 60296
Max   Chunk Size: 211
Number of Blocks: 1501
Av.  Block  Size: 40
 free=30698273 frag=0.0039

So the free space decreased by 18 (heap words - if I am correct)
however the number of blocks increased by 3 blocks. So I assume one
bigger block got split into couple smaller ones. A non-intuitive
behavior - at least for the first look.
The additionally odd thing about this collection is the size of the
promoted object(s) 144 bytes where normally we promoted around 21k.

> if you
> haven't already done so. (Haven't followed the preceding part of the thread
> very closely though.) If/ehrn you find that the spikiness is specific to
> CMS+ParNew, one can do a few more experiments. Bear in mind that promotion
> policies are slightly different between the two, and ParNew isn't as
> adaptive as is ParallelOld (in terms of resizing/reshaping the heap). If the
> expts are under controlled conditions and you see spikes with a steady
> workload, one might be able to more quickly pinpoint the culprit(s). (For
> the case of CMS+ParNew, for example, the dynamic sizing of local promotion
> buffer lists would be one place to shine a light on.)
Understood. As I wrote previously we are in the middle of 10-day-long
test. Once the test is done, I will rerun the test with (at least two
instances that has):
-XX:-UseParNewGC
-XX:+PrintOldPLAB -XX:+PrintPLAB (and the old settings).

Thanks for your help!
Bartek
>
> -- ramki

From taras.tielkes at gmail.com  Mon Jan 28 13:11:35 2013
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Mon, 28 Jan 2013 22:11:35 +0100
Subject: java 1.7.0u4 GarbageCollectionNotificationInfo API
Message-ID: <CA+R7V78w6HC9My1bjTZi3VaV1-4b5qxXpSAeAKUFcWpRuU9igg@mail.gmail.com>

Hi,

I'm playing around with the new(ish) GarbageCollectionNotificationInfo API.
We're using ParNew+CMS in all our systems, and my first goal is a
comparison between -XX:+PrintGCDetails -verbose:gc output and the actual
data coming through the notification API. I'm using Java 1.7.0u6 for the
experiments.

So far, I have a number of questions:
1) duration times

The javadoc for gcInfo.getDuration() describes the returned value as
expressed in milliseconds. However, the values differ to the gc logs by
 several orders of magnitude. How are they calculated?

On a 1-core Linux x64 VM, the values actually look like microseconds, but
on a Win32 machines I still can't figure out any resemblance to gc log
timings.

Apart from the unit, what should the value represent? Real time or user
time?

2) CMS events with cause "No GC"

How exactly do the phases of CMS map to the notifications emitted for the
CMS collector?

I sometimes get events with cause "No GC". Does this indicate a background
CMS cycle being initiated by hitting the occupancy fraction threshold?

3) Eden/Survivor

It seems that the MemoryUsage API treats Eden and Survivor separately, i.e.
survivor is not a subset of eden. This is different from the gc log
presentation. Is my understanding correct?

In general, I think it would be useful to have a code sample for the GC
notification API that generates output as close as possible to
-XX:+PrintGCDetails -verbose:gc, as far as the data required to do so is
available.

The API looks quite promising, it seems it could really benefit from a bit
of documentation love :)

Thanks,
-tt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130128/4dbc089f/attachment.html 

From pasthelod at gmail.com  Tue Jan 29 02:01:11 2013
From: pasthelod at gmail.com (Pas)
Date: Tue, 29 Jan 2013 11:01:11 +0100
Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a
	stop-the-world System.gc() alleviates
In-Reply-To: <CABzyjykhWafEoLncUKT_t3E3ic6E5JM-YXgdNp9FCj+SwbbwQA@mail.gmail.com>
References: <CAF1H-TA0FET1rHQFpfLQV-bZM=uYk7hjus6Zsg+U04twmmvvhw@mail.gmail.com>
	<CABzyjykhWafEoLncUKT_t3E3ic6E5JM-YXgdNp9FCj+SwbbwQA@mail.gmail.com>
Message-ID: <CAF1H-TCqpBUsiTwe1ck_9H9feRGUpPQ5CPc8hBbXsOvwhvBGNQ@mail.gmail.com>

Hello!

I assume promotion volume is the size of the last age group in the tenuring
distribution, in my case the age 4 group. From the start of this "more than
1 second minor GC" regime, promotion volume hasn't changed, usually less
than one megabyte, and to the end of the log it stays below 12MB. (~1100
ParNew GCs, only 172 above one MB.)

Thanks,
Pas

On Sun, Jan 27, 2013 at 9:00 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>wrote:

> Check to see if the pause time increase correlates to a jump in the
> promotion volume per scavenge. Should be easy to get from yr gc logs (which
> i haven't looked at).
>
> -- ramki
>
>
>
> On Sun, Jan 27, 2013 at 12:34 AM, Pas <pasthelod at gmail.com> wrote:
>
>> Hello,
>>
>> Long story short, Minor GC times jump from ~30ms to more than a second
>> (and increase to about 5 seconds), and only an explicit paralell Full GC
>> can whack it out of this madness. Interestingly it looks like this
>> bug/feature manifests when a big ~100+ MB byte[] object gets allocated,
>> thus triggering a CMS initial-sweep. (The CMS runs fine though, but the
>> young gen collections take forever.)
>>
>> http://pastebin.com/RcBkCEEE (of course, if someone's interested I
>> here's the full 50 MBs of the otherwise rather predictable log, 2.9MB
>> compressed http://zomg.hu/work/wtf-gc.log.xz )
>>
>> We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon
>> E3-something with plenty of RAM for the heap, with the following options:
>>
>> -Xmx5128M
>> (-Xms5128M, though the linked gclog is without this)
>> -XX:NewSize=300m
>> -XX:MaxNewSize=300m
>> -XX:PermSize=64m
>> -XX:MaxPermSize=192m
>>
>> -XX:+UseParNewGC
>> -XX:ParallelGCThreads=2
>> -XX:MaxTenuringThreshold=4
>> -XX:SurvivorRatio=3
>>
>> -XX:+UseConcMarkSweepGC
>> -XX:+UnlockDiagnosticVMOptions
>> -XX:+CMSScavengeBeforeRemark
>> -XX:CMSInitiatingOccupancyFraction=65
>>
>> -XX:+PrintGC
>> -XX:+PrintGCDetails
>> -XX:+PrintTenuringDistribution
>> -XX:+PrintGCDateStamps
>> -XX:PrintFLSStatistics=1
>> -Xloggc:/logs/gc.log
>> -verbose:gc
>>
>> Has anyone experienced similar issues?
>>
>> Thanks for your time,
>> Pas
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130129/20e639c4/attachment.html