Spikes in the duration of the ParNew collections

Mon Jan 28 08:27:03 PST 2013

Hi Ramki,

See my comment in-line:

On Fri, Jan 25, 2013 at 9:22 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
> Hi Bartek --
>
> On Fri, Jan 25, 2013 at 8:51 AM, Bartek Markocki
> <bartosz.markocki at gmail.com> wrote:
>>
>>
>> 1. While preparing to send the first email I found this post
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html
>> where Ramki said 'There was
>> an old issue wrt monitor deflation that was foixed a few releases
>> ago'. As we are on the latest update (38) I am not expecting this to
>> apply here but do you know the bug id for this, so I can with full
>> confidence eliminate this?
>
>
> That bug involved time outside of the actual collection time, during which
> the threads were paused. If I understand yr problem correctly, it's that the
> collection times themselves are spikey. If that understanding is correct,
> then yr problem would not be related to the above email.
You got it correctly; we talk about 'inside' collection time here.

>
>>
>> 2. While monitoring the running application I noticed that we
>> constantly have about 2.5k objects that will need to be finalized.
>> Those objects are marked as eligible for finalization in buckets (min:
>> 600, max 2000 objects). The objects are instances of
>> java.util.zip.Deflater class and are finalized quite quickly (below 2
>> sec - the refresh interval for my monitoring tool). Do you think it
>> might have something in common? This observation I did recently (so
>> for the ParallelOld collector) so for the moment I am not able to
>> correlate this with high ParNew times.
>
>
>
> Yes, I'd look to see if "PrintReferenceGC" times indicate any diffs,
Unfortunately no significant diffs - in terms of time as well as the
amount of refs :(
The only peculiar thing (at least to me) that I noticed around the
collection in question comes from FLSStatistics.
The binary tree before and after looks exactly the same:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 30637977
Max   Chunk Size: 30637977
Number of Blocks: 1
Av.  Block  Size: 30637977
Tree      Height: 1

However the indexed free list before and after shows a change:
Before:
Statistics for IndexedFreeLists:
--------------------------------
Total Free Space: 60314
Max   Chunk Size: 211
Number of Blocks: 1498
Av.  Block  Size: 40
 free=30698291 frag=0.0039

After:
--------------------------------
Total Free Space: 60296
Max   Chunk Size: 211
Number of Blocks: 1501
Av.  Block  Size: 40
 free=30698273 frag=0.0039

So the free space decreased by 18 (heap words - if I am correct)
however the number of blocks increased by 3 blocks. So I assume one
bigger block got split into couple smaller ones. A non-intuitive
behavior - at least for the first look.
The additionally odd thing about this collection is the size of the
promoted object(s) 144 bytes where normally we promoted around 21k.

> if you
> haven't already done so. (Haven't followed the preceding part of the thread
> very closely though.) If/ehrn you find that the spikiness is specific to
> CMS+ParNew, one can do a few more experiments. Bear in mind that promotion
> policies are slightly different between the two, and ParNew isn't as
> adaptive as is ParallelOld (in terms of resizing/reshaping the heap). If the
> expts are under controlled conditions and you see spikes with a steady
> workload, one might be able to more quickly pinpoint the culprit(s). (For
> the case of CMS+ParNew, for example, the dynamic sizing of local promotion
> buffer lists would be one place to shine a light on.)
Understood. As I wrote previously we are in the middle of 10-day-long
test. Once the test is done, I will rerun the test with (at least two
instances that has):
-XX:-UseParNewGC
-XX:+PrintOldPLAB -XX:+PrintPLAB (and the old settings).

Thanks for your help!
Bartek
>
> -- ramki