From simone.bordet at gmail.com  Tue Dec  9 13:40:47 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Tue, 9 Dec 2014 14:40:47 +0100
Subject: G1 eden resizing behaviour ?
Message-ID: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>

Hi,

I observed the following behaviour in G1 and I would like some
feedback with respect to whether it is right/expected/known or not.

I have an application that allocates about 500-700 MiB/s, with a
promotion rate of about 5-7 MiB/s on a 32 GiB heap.
>From time to time, G1 performs a marking and then mixed GCs.

During the young GC (not mixed) that follows the end of the marking
phase, the eden is reduced to a very small size, for example:

2014-12-05T04:47:39.528-0800: 5054.135: [GC concurrent-cleanup-end,
0.0009276 secs]
2014-12-05T04:47:53.949-0800: 5068.556: [GC pause (G1 Evacuation Pause) (young)
   [Eden: 12.6G(12.6G)->0.0B(880.0M) Survivors: 768.0M->752.0M Heap:
26.8G(32.0G)->14.2G(32.0G)]
 [Times: user=3.79 sys=0.05, real=0.22 secs]

In the example above the Eden is shrunk from 12.6G to 880M.

G1 then keeps the eden small for the mixed GCs that follow.
After the mixed GCs have finished, young GCs are performed again,
which eventually re-grow the eden to a size similar to what was before
being shrunk.

In my case, in normal young GC regime, the young GCs happens more or
less every 15-20s, while during the mixed GC and for few young GCs
that follow they happen every about 2s.

Now, this behaviour can be explained by the fact that, in order to
make room for the old regions to be evacuated, and yet still stay
within the pause goal, less eden regions need to be taken into account
for the evacuation. Since the allocation rate does not change, less
eden regions cause more frequent GCs.

I am wondering whether this behaviour is right/expected/known, and
what effects it has on the pause time prediction logic (e.g. mixed GCs
times are not taken into account) as well as on the early promotion of
objects from eden.

I can provide GC logs and command line options if required.

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From thomas.schatzl at oracle.com  Tue Dec  9 13:43:43 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 09 Dec 2014 14:43:43 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
Message-ID: <1418132623.3361.23.camel@oracle.com>

Hi,

On Tue, 2014-12-09 at 14:40 +0100, Simone Bordet wrote:
> Hi,
> 
> I observed the following behaviour in G1 and I would like some
> feedback with respect to whether it is right/expected/known or not.
> 
> I have an application that allocates about 500-700 MiB/s, with a
> promotion rate of about 5-7 MiB/s on a 32 GiB heap.
> From time to time, G1 performs a marking and then mixed GCs.
> 
> During the young GC (not mixed) that follows the end of the marking
> phase, the eden is reduced to a very small size, for example:

  sounds like JDK-8035557.

Thanks,
  Thomas


From simone.bordet at gmail.com  Tue Dec  9 13:58:19 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Tue, 9 Dec 2014 14:58:19 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <1418132623.3361.23.camel@oracle.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
Message-ID: <CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>

Hi,

On Tue, Dec 9, 2014 at 2:43 PM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
>   sounds like JDK-8035557.

Thanks for the fast response !

With respect to https://bugs.openjdk.java.net/browse/JDK-8035557 I
observe that the Eden shrinks in the young GC *before* the mixed GCs.
Is that just an alternate behaviour of the same bug ?

I remember a while back G1 performing mixed GC after the end of marking.
More recently, it always perform a young GC after the end of marking,
before starting the mixed GCs.
Was this young GC added exactly to cope with mispredictions due to
evacuations of old regions ?

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From thomas.schatzl at oracle.com  Tue Dec  9 14:06:50 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 09 Dec 2014 15:06:50 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
Message-ID: <1418134010.3361.26.camel@oracle.com>

Hi,

On Tue, 2014-12-09 at 14:58 +0100, Simone Bordet wrote:
> Hi,
> 
> On Tue, Dec 9, 2014 at 2:43 PM, Thomas Schatzl
> <thomas.schatzl at oracle.com> wrote:
> >   sounds like JDK-8035557.
> 
> Thanks for the fast response !
> 
> With respect to https://bugs.openjdk.java.net/browse/JDK-8035557 I
> observe that the Eden shrinks in the young GC *before* the mixed GCs.
> Is that just an alternate behaviour of the same bug ?

This may be because the heap is already very full (around
100-G1ReservePercent) so it cuts down on the young gen size.

G1 tries to keep G1ReservePercent of heap empty to avoid evacuation
failure.

Another reason could be that G1 thinks that the pauses caused by marking
cut too much into the available time budget (depending on your
settings), so it decreases the heap.

> I remember a while back G1 performing mixed GC after the end of marking.
> More recently, it always perform a young GC after the end of marking,
> before starting the mixed GCs.
> Was this young GC added exactly to cope with mispredictions due to
> evacuations of old regions ?

  that is JDK-8057781. I do not think it has ever been different though,
but I may be wrong.

Thomas


From simone.bordet at gmail.com  Tue Dec  9 14:26:03 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Tue, 9 Dec 2014 15:26:03 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <1418134010.3361.26.camel@oracle.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
Message-ID: <CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>

Hi,

On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
> This may be because the heap is already very full (around
> 100-G1ReservePercent) so it cuts down on the young gen size.
>
> G1 tries to keep G1ReservePercent of heap empty to avoid evacuation
> failure.

It's not my case: I have a 32 GiB heap with a permanent live set of
about 5-6 GiB, so I should have plenty of space even with a 15 GiB or
so eden, which is what I typically see as max eden size.

> Another reason could be that G1 thinks that the pauses caused by marking
> cut too much into the available time budget (depending on your
> settings), so it decreases the heap.

You mean by the remark, which is STW ?
Indeed I have very long remark pauses (up to 2.5s), apparently caused
by weak reference processing.

I need to investigate this issue, as the application itself does not
use them (perhaps some library ?).
There is a big difference between the references processed in the
remark phase (3 millions) and those processed during a young GC (less
than a thousand).

Am I correct assuming that those processed during remark only belong
to tenured ?

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From yu.zhang at oracle.com  Wed Dec 10 06:18:56 2014
From: yu.zhang at oracle.com (Yu Zhang)
Date: Tue, 09 Dec 2014 22:18:56 -0800
Subject: G1 eden resizing behaviour ?
In-Reply-To: <CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
	<CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
Message-ID: <5487E5D0.9050207@oracle.com>

Simone,

Please see my comments in line.

Thanks,
Jenny

On 12/9/2014 6:26 AM, Simone Bordet wrote:
> Hi,
>
> On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl
> <thomas.schatzl at oracle.com> wrote:
>> This may be because the heap is already very full (around
>> 100-G1ReservePercent) so it cuts down on the young gen size.
>>
>> G1 tries to keep G1ReservePercent of heap empty to avoid evacuation
>> failure.
> It's not my case: I have a 32 GiB heap with a permanent live set of
> about 5-6 GiB, so I should have plenty of space even with a 15 GiB or
> so eden, which is what I typically see as max eden size.
Can you add -XX:+PrintAdaptiveSizePolicy and share your logs?  I was 
guessing the same, that the heap is full.  The mixed gc might not clean 
all the garbage.  Even though the live data set is 5-6 GB, the heap can 
still be close to full.
>
>> Another reason could be that G1 thinks that the pauses caused by marking
>> cut too much into the available time budget (depending on your
>> settings), so it decreases the heap.
> You mean by the remark, which is STW ?
> Indeed I have very long remark pauses (up to 2.5s), apparently caused
> by weak reference processing.
>
> I need to investigate this issue, as the application itself does not
> use them (perhaps some library ?).
> There is a big difference between the references processed in the
> remark phase (3 millions) and those processed during a young GC (less
> than a thousand).
>
> Am I correct assuming that those processed during remark only belong
> to tenured ?
I agree.  Do you have -XX:+ParallelRefProcEnabled?  This will help 
reducing the refproc time, but 3 millions is a lot.
>
> Thanks !
>


From charlie.hunt at oracle.com  Wed Dec 10 16:04:41 2014
From: charlie.hunt at oracle.com (charlie hunt)
Date: Wed, 10 Dec 2014 08:04:41 -0800
Subject: G1 eden resizing behaviour ?
In-Reply-To: <5487E5D0.9050207@oracle.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
	<CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
	<5487E5D0.9050207@oracle.com>
Message-ID: <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com>

Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit.  If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do.

Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time.

hths,

charlie

> On Dec 9, 2014, at 10:18 PM, Yu Zhang <yu.zhang at oracle.com> wrote:
> 
> Simone,
> 
> Please see my comments in line.
> 
> Thanks,
> Jenny
> 
> On 12/9/2014 6:26 AM, Simone Bordet wrote:
>> Hi,
>> 
>> On Tue, Dec 9, 2014 at 3:06 PM, Thomas Schatzl
>> <thomas.schatzl at oracle.com> wrote:
>>> This may be because the heap is already very full (around
>>> 100-G1ReservePercent) so it cuts down on the young gen size.
>>> 
>>> G1 tries to keep G1ReservePercent of heap empty to avoid evacuation
>>> failure.
>> It's not my case: I have a 32 GiB heap with a permanent live set of
>> about 5-6 GiB, so I should have plenty of space even with a 15 GiB or
>> so eden, which is what I typically see as max eden size.
> Can you add -XX:+PrintAdaptiveSizePolicy and share your logs?  I was guessing the same, that the heap is full.  The mixed gc might not clean all the garbage.  Even though the live data set is 5-6 GB, the heap can still be close to full.
>> 
>>> Another reason could be that G1 thinks that the pauses caused by marking
>>> cut too much into the available time budget (depending on your
>>> settings), so it decreases the heap.
>> You mean by the remark, which is STW ?
>> Indeed I have very long remark pauses (up to 2.5s), apparently caused
>> by weak reference processing.
>> 
>> I need to investigate this issue, as the application itself does not
>> use them (perhaps some library ?).
>> There is a big difference between the references processed in the
>> remark phase (3 millions) and those processed during a young GC (less
>> than a thousand).
>> 
>> Am I correct assuming that those processed during remark only belong
>> to tenured ?
> I agree.  Do you have -XX:+ParallelRefProcEnabled?  This will help reducing the refproc time, but 3 millions is a lot.
>> 
>> Thanks !
>> 
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From simone.bordet at gmail.com  Wed Dec 10 17:56:03 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Wed, 10 Dec 2014 18:56:03 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
	<CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
	<5487E5D0.9050207@oracle.com>
	<00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com>
Message-ID: <CAFWmRJ1BZ56HHs+ryKWcF3ihhXTf+_+L_tfh00-4FwRiqRLesw@mail.gmail.com>

Hi,

On Wed, Dec 10, 2014 at 5:04 PM, charlie hunt <charlie.hunt at oracle.com> wrote:
> Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit.  If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do.
>
> Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time.

I have both enabled in the logs, that are unfortunately too big to be
attached to a message of this mailing list.
For those interested, I can send them privately.

There are no SoftReferences, only a large number of WeakReferences.

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From yu.zhang at oracle.com  Wed Dec 10 20:41:04 2014
From: yu.zhang at oracle.com (Yu Zhang)
Date: Wed, 10 Dec 2014 12:41:04 -0800
Subject: G1 eden resizing behaviour ?
In-Reply-To: <CAFWmRJ1BZ56HHs+ryKWcF3ihhXTf+_+L_tfh00-4FwRiqRLesw@mail.gmail.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
	<CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
	<5487E5D0.9050207@oracle.com>
	<00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com>
	<CAFWmRJ1BZ56HHs+ryKWcF3ihhXTf+_+L_tfh00-4FwRiqRLesw@mail.gmail.com>
Message-ID: <5488AFE0.2060004@oracle.com>

Simone,

thanks for the log.  checking the Eden size for young gc, I think the 
Eden size decrease is more related to mixed gc.  The Eden size for 
several young gcs after mixed gc are small.  This is a known issue.

Thanks,
Jenny

On 12/10/2014 9:56 AM, Simone Bordet wrote:
> Hi,
>
> On Wed, Dec 10, 2014 at 5:04 PM, charlie hunt <charlie.hunt at oracle.com> wrote:
>> Adding -XX:+PrintReferenceGC may also help identify which type of Reference objects are the culprit.  If it is SoftReferences (my favorite :-] ), there is some additional tweaking you can do.
>>
>> Adding -XX:+ParallelRefProcEnabled, as Jenny suggested, should help shorten the length of time.
> I have both enabled in the logs, that are unfortunately too big to be
> attached to a message of this mailing list.
> For those interested, I can send them privately.
>
> There are no SoftReferences, only a large number of WeakReferences.
>


From simone.bordet at gmail.com  Wed Dec 10 21:14:16 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Wed, 10 Dec 2014 22:14:16 +0100
Subject: G1 eden resizing behaviour ?
In-Reply-To: <5488AFE0.2060004@oracle.com>
References: <CAFWmRJ27-yaniB1Ng4DPog0AAY=UBpageBcMsZbKmV5HZmTDBQ@mail.gmail.com>
	<1418132623.3361.23.camel@oracle.com>
	<CAFWmRJ1SqTaRo8q4YDir17cm0Q60606qXKvrPYefXk1EXA5+JA@mail.gmail.com>
	<1418134010.3361.26.camel@oracle.com>
	<CAFWmRJ3SwVjQp=Xv6aa+dZV7wsa_=iW13o7=CdYGR9K7gmvFLA@mail.gmail.com>
	<5487E5D0.9050207@oracle.com>
	<00F2254A-E48C-4969-B7B7-FCA1983F2E64@oracle.com>
	<CAFWmRJ1BZ56HHs+ryKWcF3ihhXTf+_+L_tfh00-4FwRiqRLesw@mail.gmail.com>
	<5488AFE0.2060004@oracle.com>
Message-ID: <CAFWmRJ0fioEgbEVmKU2_cCsOV5OE=sJMG59wuv6khOevpwuBEA@mail.gmail.com>

Hi,

On Wed, Dec 10, 2014 at 9:41 PM, Yu Zhang <yu.zhang at oracle.com> wrote:
> Simone,
>
> thanks for the log.  checking the Eden size for young gc, I think the Eden
> size decrease is more related to mixed gc.  The Eden size for several young
> gcs after mixed gc are small.  This is a known issue.

Thanks for confirming this.

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From simone.bordet at gmail.com  Thu Dec 11 15:38:46 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Thu, 11 Dec 2014 16:38:46 +0100
Subject: G1: "Other" time too long ?
Message-ID: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>

Hi,

G1 with a 32 GiB heap (16 MiB region size), I was seeing high "Update
RS" and "Scan RS" times during mixed GCs.

I am aware of -XX:G1RSetRegionEntries, but I wanted to try another
path: whether reducing manually the region size caused less inter
region references and therefore reduced the probable coarsening that
was the cause of the long RS times.

So I set the region size to 2 MiB and re-run.

Now I get very high "Other" times, for example:

 [Other: 464.1 ms]
      [Choose CSet: 0.1 ms]
      [Ref Proc: 52.4 ms]
      [Ref Enq: 1.8 ms]
      [Redirty Cards: 19.4 ms]
      [Free CSet: 22.7 ms]

The sum of the subtask times is not close to the "Other" time so I was
wondering what else it's done in the "Other" processing, or whether
perhaps it is not reporting what I think (e.g. a sequential time vs a
parallel time).

I'd probably revert to a 16 MiB region size and setting
G1RSetRegionEntries, but I was wondering if someone can shed some
light on this.

Logs are too big for this mailing list, but I can provide them to
interested people.

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From yu.zhang at oracle.com  Thu Dec 11 16:58:09 2014
From: yu.zhang at oracle.com (Yu Zhang)
Date: Thu, 11 Dec 2014 08:58:09 -0800
Subject: G1: "Other" time too long ?
In-Reply-To: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>
References: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>
Message-ID: <5489CD21.1050207@oracle.com>

Simone,

Do you have  -XX:+G1SummarizeRSetStats?  I had seen Other does not add 
up with this statistics, especially when RSet is big.

Currently the default G1RSetRegionEntries is set according to region 
size.  So with bigger region size, you have more G1RSetRegionEntries.  
You might get more references between regions with smaller region size.  
You might already tried this, you can use -XX:+G1SummarizeRSetStats 
-XX:G1SummarizeRSetStatsPeriod=<n> to see if you have coarsening.  If 
coarsening is high, the RS operations are more expensive.

Thanks,
Jenny

On 12/11/2014 7:38 AM, Simone Bordet wrote:
> Hi,
>
> G1 with a 32 GiB heap (16 MiB region size), I was seeing high "Update
> RS" and "Scan RS" times during mixed GCs.
>
> I am aware of -XX:G1RSetRegionEntries, but I wanted to try another
> path: whether reducing manually the region size caused less inter
> region references and therefore reduced the probable coarsening that
> was the cause of the long RS times.
>
> So I set the region size to 2 MiB and re-run.
>
> Now I get very high "Other" times, for example:
>
>   [Other: 464.1 ms]
>        [Choose CSet: 0.1 ms]
>        [Ref Proc: 52.4 ms]
>        [Ref Enq: 1.8 ms]
>        [Redirty Cards: 19.4 ms]
>        [Free CSet: 22.7 ms]
>
> The sum of the subtask times is not close to the "Other" time so I was
> wondering what else it's done in the "Other" processing, or whether
> perhaps it is not reporting what I think (e.g. a sequential time vs a
> parallel time).
>
> I'd probably revert to a 16 MiB region size and setting
> G1RSetRegionEntries, but I was wondering if someone can shed some
> light on this.
>
> Logs are too big for this mailing list, but I can provide them to
> interested people.
>
> Thanks !
>


From simone.bordet at gmail.com  Thu Dec 11 18:16:39 2014
From: simone.bordet at gmail.com (Simone Bordet)
Date: Thu, 11 Dec 2014 19:16:39 +0100
Subject: G1: "Other" time too long ?
In-Reply-To: <5489CD21.1050207@oracle.com>
References: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>
	<5489CD21.1050207@oracle.com>
Message-ID: <CAFWmRJ3nUbWe4NUHkY5oMxoQeJyJRHUn0D4ZhkDFofMJNbfjUw@mail.gmail.com>

Hi,

thanks for the quick reply.

On Thu, Dec 11, 2014 at 5:58 PM, Yu Zhang <yu.zhang at oracle.com> wrote:
> Simone,
>
> Do you have  -XX:+G1SummarizeRSetStats?

Yes.

>  I had seen Other does not add up
> with this statistics, especially when RSet is big.

Ok, thanks for confirming this.
Have you seen differences as big as mine ?

> Currently the default G1RSetRegionEntries is set according to region size.
> So with bigger region size, you have more G1RSetRegionEntries.  You might
> get more references between regions with smaller region size.  You might
> already tried this, you can use -XX:+G1SummarizeRSetStats
> -XX:G1SummarizeRSetStatsPeriod=<n> to see if you have coarsening.  If
> coarsening is high, the RS operations are more expensive.

I did not have G1SummarizeRSetStats for the case with region_size = 16
MiB, but I do have it for the case region_size = 2 MiB.
I can see coarsenings in the latter case, so I guess it will be more
so for the former case, but I'll verify it.
The coarsenings I see are in the order 400-500 with peaks in the
thousands and one big at 289509.

I see that the formula to calculate the region entries is:

table_size = base * (log(region_size / 1M) + 1)

so for a 16 MiB region size the region entries are 564.

I'll retry with a higher value to see if I get any benefit.

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From thomas.schatzl at oracle.com  Thu Dec 11 18:22:17 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 11 Dec 2014 19:22:17 +0100
Subject: G1: "Other" time too long ?
In-Reply-To: <CAFWmRJ3nUbWe4NUHkY5oMxoQeJyJRHUn0D4ZhkDFofMJNbfjUw@mail.gmail.com>
References: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>
	<5489CD21.1050207@oracle.com>
	<CAFWmRJ3nUbWe4NUHkY5oMxoQeJyJRHUn0D4ZhkDFofMJNbfjUw@mail.gmail.com>
Message-ID: <1418322137.3214.3.camel@oracle.com>

Hi,

On Thu, 2014-12-11 at 19:16 +0100, Simone Bordet wrote:
> Hi,
> 
> thanks for the quick reply.
> 
> On Thu, Dec 11, 2014 at 5:58 PM, Yu Zhang <yu.zhang at oracle.com> wrote:
> > Simone,
> >
> > Do you have  -XX:+G1SummarizeRSetStats?
> 
> Yes.
> 
> >  I had seen Other does not add up
> > with this statistics, especially when RSet is big.
> 
> Ok, thanks for confirming this.
> Have you seen differences as big as mine ?

Yes, we can also see pauses caused by G1SummarizeRSetStats in that
range. G1SummarizeRSetStats is not meant to be always on, just to
diagnose potential remembered set problems.

Thanks,
  Thomas


From yu.zhang at oracle.com  Thu Dec 11 23:14:36 2014
From: yu.zhang at oracle.com (Yu Zhang)
Date: Thu, 11 Dec 2014 15:14:36 -0800
Subject: G1: "Other" time too long ?
In-Reply-To: <CAFWmRJ3nUbWe4NUHkY5oMxoQeJyJRHUn0D4ZhkDFofMJNbfjUw@mail.gmail.com>
References: <CAFWmRJ06=eBpfpUBtOHH8w=i=Oz2n-tB4v3cMVDJp=b_FgMc3Q@mail.gmail.com>
	<5489CD21.1050207@oracle.com>
	<CAFWmRJ3nUbWe4NUHkY5oMxoQeJyJRHUn0D4ZhkDFofMJNbfjUw@mail.gmail.com>
Message-ID: <548A255C.30704@oracle.com>

Simone,

The formula is correct.  But with 16M region size, you get 256(base)*5 
entries.
Also with bigger region size, you might get less Remember set 
references.  So bigger chance not seeing coarsening.

Thanks,
Jenny

On 12/11/2014 10:16 AM, Simone Bordet wrote:
> I did not have G1SummarizeRSetStats for the case with region_size = 16
> MiB, but I do have it for the case region_size = 2 MiB.
> I can see coarsenings in the latter case, so I guess it will be more
> so for the former case, but I'll verify it.
> The coarsenings I see are in the order 400-500 with peaks in the
> thousands and one big at 289509.
>
> I see that the formula to calculate the region entries is:
>
> table_size = base * (log(region_size / 1M) + 1)
>
> so for a 16 MiB region size the region entries are 564.
>
> I'll retry with a higher value to see if I get any benefit.


From java at elyograg.org  Tue Dec 16 17:30:28 2014
From: java at elyograg.org (Shawn Heisey)
Date: Tue, 16 Dec 2014 10:30:28 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
Message-ID: <54906C34.8080408@elyograg.org>

Here's a message I sent to dev at lucene.apache.org, with enough quoted history to know what's happening.  My testing is with Solr 4.7.2.

http://lucene.apache.org/solr/

Rory O'Donnell at Oracle suggested that I start a thread here.

Thanks,
Shawn

On 12/6/2014 3:00 PM, Shawn Heisey wrote:

> On 12/5/2014 2:42 PM, Erick Erickson wrote:
>> Saw this on the Cloudera website:
>>
>> http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-hbase/
>>
>> Original post here:
>> https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase
>>
>> Although it's for hbase, I thought the presentation went into enough
>> detail about what improvements they'd seen that I can see it being
>> useful for Solr folks. And we have some people on this list who are
>> interested in this sort of thing....
> Very interesting.  My own experiences with G1 and Solr (which I haven't
> repeated since early Java 7 releases, something like 7u10 or 7u13) would
> show even worse spikes compared to the blue lines on those graphs ...
> and my heap isn't anywhere even CLOSE to 100GB.  Solr probably has
> different garbage creation characteristics than hbase.

Followup with graphs.  I've cc'd Rory at Oracle too, with hopes that
this info will ultimately reach those who work on G1.  I can provide the
actual GC logs as well.

Here's a graph of a GC log lasting over two weeks with a tuned CMS
collector and Oracle Java 7u25 and a 6GB heap.

https://www.dropbox.com/s/mygjeviyybqqnqd/cms-7u25.png?dl=0

CMS was tuned using these settings:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

This graph shows that virtually all collection pauses were a little
under half a second.  There were exactly three full garbage collections,
and each one took around six seconds.  While that is a significant
pause, having only three such collections over a period of 16 days
sounds pretty good to me.

Here's about half as much runtime (8 days) on the same server running G1
with Oracle 7u72 and the same 6GB heap.  G1 is untuned, because I do not
know how:

https://www.dropbox.com/s/2kgx60gj988rflj/g1-7u72.png?dl=0

Most of these collections were around a tenth of a second ... which is
certainly better than nearly half a second ... but there are a LOT of
collections that take longer than a second, and a fair number of them
that took between 3 and 5 seconds.

It's difficult to say which of these graphs is actually better.  The CMS
graph is certainly more consistent, and does a LOT fewer full GCs ...
but is the 4 to 1 improvement in a typical GC enough to reveal
significantly better performance?  My instinct says that it would NOT be
enough for that, especially with so many collections taking 1-3 seconds.

If the server was really busy (mine isn't), I wonder whether the GC
graph would look similar, or whether it would be really different.  A
busy server would need to collect a lot more garbage, so I fear that the
yellow and black parts of the G1 graph would dominate more than they do
in my graph, which would be overall a bad thing.  Only real testing on
busy servers can tell us that.

I can tell you for sure that the G1 graph looks a lot better than it did
in early Java 7 releases, but additional work by Oracle (and perhaps
some G1 tuning options) might significantly improve it.

Thanks,
Shawn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141216/a0721515/attachment.html>

From ashwin.jayaprakash at gmail.com  Wed Dec 17 04:47:36 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Tue, 16 Dec 2014 20:47:36 -0800
Subject: Multi-second ParNew collections but stable CMS
Message-ID: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>

Hi, we have a cluster of ElasticSearch servers running with 31G heap and
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).

While our old gen seems to be very stable with about 40% usage and no Full
GCs so far, our young gen keeps growing from ~50MB to 850MB every few
seconds. These ParNew collections are taking anywhere between 1-7 seconds
and is causing some of our requests to time out. The eden space keeps
filling up and then cleared every 30-60 seconds. There is definitely work
being done by our JVM in terms of caching/buffering objects for a few
seconds, writing to disk and then clearing the objects (typical
Lucene/ElasticSearch indexing and querying workload)

These long pauses are not good for our server throughput and I was doing
some reading. I got some conflicting reports on how Cassandra is configured
compared to Hadoop. There are also references
<http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113>
which I thought would've been addressed in the JRE version we are
using. Cassandra
recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
<http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.

Since most of our allocations seem to be quite short lived, is there a way
to avoid these "long" young gen pauses?

Thanks in advance. Here are some details.

*Heap settings:*
java -Xmx31000m -Xms31000m
-Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
-XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
-XX:+PrintGCDetails


*Last few lines of "kill -3 pid" output:*
Heap
 par new generation   total 996800K, used 865818K [0x00007fa18e800000,
0x00007fa1d2190000, 0x00007fa1d2190000)
  eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
0x00007fa1c4950000)
  from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
0x00007fa1d2190000)
  to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
0x00007fa1cb570000)
 concurrent mark-sweep generation total 30636480K, used 12036523K
[0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
 concurrent-mark-sweep perm gen total 128856K, used 77779K
[0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)

*Sample gc log:*
2014-12-11T23:32:16.121+0000: 710.618: [ParNew
Desired survivor size 56688640 bytes, new threshold 6 (max 6)
- age   1:    2956312 bytes,    2956312 total
- age   2:     591800 bytes,    3548112 total
- age   3:      66216 bytes,    3614328 total
- age   4:     270752 bytes,    3885080 total
- age   5:     615472 bytes,    4500552 total
- age   6:     358440 bytes,    4858992 total
: 900635K->8173K(996800K), 0.0317340 secs]
1352217K->463460K(31633280K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: -433641480
Max   Chunk Size: -433641480
Number of Blocks: 1
Av.  Block  Size: -433641480
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1227
Max   Chunk Size: 631
Number of Blocks: 3
Av.  Block  Size: 409
Tree      Height: 3
, 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]


Ashwin Jayaprakash.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141216/2ee52398/attachment-0001.html>

From thomas.schatzl at oracle.com  Wed Dec 17 15:50:56 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 17 Dec 2014 16:50:56 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <54906C34.8080408@elyograg.org>
References: <54906C34.8080408@elyograg.org>
Message-ID: <1418831456.3255.22.camel@oracle.com>

Hi Shawn,

> On 12/6/2014 3:00 PM, Shawn Heisey wrote:
> > On 12/5/2014 2:42 PM, Erick Erickson wrote:
> > > Saw this on the Cloudera website:
> > > 
> > > http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-hbase/
> > > 
> > > O[...]
> Here's a graph of a GC log lasting over two weeks with a tuned CMS
> collector and Oracle Java 7u25 and a 6GB heap.
> 
> https://www.dropbox.com/s/mygjeviyybqqnqd/cms-7u25.png?dl=0
> 
> CMS was tuned using these settings:
> 
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
> 
> This graph shows that virtually all collection pauses were a little
> under half a second.  There were exactly three full garbage collections,
> and each one took around six seconds.  While that is a significant
> pause, having only three such collections over a period of 16 days
> sounds pretty good to me.
> 
> Here's about half as much runtime (8 days) on the same server running G1
> with Oracle 7u72 and the same 6GB heap.  G1 is untuned, because I do not
> know how:
>
> https://www.dropbox.com/s/2kgx60gj988rflj/g1-7u72.png?dl=0
> 
> Most of these collections were around a tenth of a second ... which is
> certainly better than nearly half a second ... but there are a LOT of
> collections that take longer than a second, and a fair number of them
> that took between 3 and 5 seconds.
> 
> It's difficult to say which of these graphs is actually better.  The CMS
> graph is certainly more consistent, and does a LOT fewer full GCs ...
> but is the 4 to 1 improvement in a typical GC enough to reveal
> significantly better performance?  My instinct says that it would NOT be
> enough for that, especially with so many collections taking 1-3 seconds.
> 
> If the server was really busy (mine isn't), I wonder whether the GC
> graph would look similar, or whether it would be really different.  A
> busy server would need to collect a lot more garbage, so I fear that the
> yellow and black parts of the G1 graph would dominate more than they do
> in my graph, which would be overall a bad thing.  Only real testing on
> busy servers can tell us that.
> 
> I can tell you for sure that the G1 graph looks a lot better than it did
> in early Java 7 releases, but additional work by Oracle (and perhaps
> some G1 tuning options) might significantly improve it.

  could you provide some logs to look at? It is impossible to give good
recommendations without having at least some more detail about what's
going on.

Preferably logs with at least the mentioned options they used to tune
the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX:
+PrintAdaptiveSizePolicy

It might also be a good idea to start with the options given in the
cloudera blog entry:

  -XX:MaxGCPauseMillis=100        // the max pause time you want
  -XX:+ParallelRefProcEnabled     // not sure, only if Solr uses lots of
soft or weak references.
  -XX:-ResizePLAB                 // that's minor
  -XX:G1NewSizePercent=1          // that may help in achieving the
pause time goal
  -Xms<heap size>M
  -Xmx<heap size>M

I do not think there is need to set the ParallelGCThreads according to
that formula. This has been the default formula for calculating the
number of threads for all collectors for a long time (but then again it
might have changed sometime in jdk7).

You may also want to use a JDK 8 build, preferably (for me :) some 8u40
EA build (e.g. from https://jdk8.java.net/download.html); there have
been a lot of improvements to G1 in JDK8, and in particular 8u40.

Thanks,
  Thomas


From jon.masamitsu at oracle.com  Wed Dec 17 16:31:57 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 17 Dec 2014 08:31:57 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
Message-ID: <5491AFFD.2080907@oracle.com>

Ashwin,

You sent a sample GC log with a fast ParNew (about 31ms) right?  Can you 
send
examples of the slow ParNew's?  I'd like to see what I can see in the logs
that is changing from the fast GC's to the slower GC's.  If I can 
download a complete
log, that would be useful (there is a size limit on what you can mail to 
the list so
mailing might not work).

Jon


On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
> Hi, we have a cluster of ElasticSearch servers running with 31G heap 
> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>
> While our old gen seems to be very stable with about 40% usage and no 
> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every 
> few seconds. These ParNew collections are taking anywhere between 1-7 
> seconds and is causing some of our requests to time out. The eden 
> space keeps filling up and then cleared every 30-60 seconds. There is 
> definitely work being done by our JVM in terms of caching/buffering 
> objects for a few seconds, writing to disk and then clearing the 
> objects (typical Lucene/ElasticSearch indexing and querying workload)
>
> These long pauses are not good for our server throughput and I was 
> doing some reading. I got some conflicting reports on how Cassandra is 
> configured compared to Hadoop. There are also references 
> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/> 
> to this old ParNew+CMS bug 
> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought 
> would've been addressed in the JRE version we are using. Cassandra 
> recommends 
> <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger 
> NewSize with just 1 for max tenuring, whereas Hadoop recommends 
> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>
> Since most of our allocations seem to be quite short lived, is there a 
> way to avoid these "long" young gen pauses?
>
> Thanks in advance. Here are some details.
> *
> Heap settings:*
> java -Xmx31000m -Xms31000m
> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC 
> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>
> *
> Last few lines of "kill -3 pid" output:*
> Heap
>  par new generation   total 996800K, used 865818K [0x00007fa18e800000, 
> 0x00007fa1d2190000, 0x00007fa1d2190000)
>   eden space 886080K,  94% used [0x00007fa18e800000, 
> 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>   from space 110720K,  25% used [0x00007fa1cb570000, 
> 0x00007fa1cd091078, 0x00007fa1d2190000)
>   to   space 110720K,   0% used [0x00007fa1c4950000, 
> 0x00007fa1c4950000, 0x00007fa1cb570000)
>  concurrent mark-sweep generation total 30636480K, used 12036523K 
> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>  concurrent-mark-sweep perm gen total 128856K, used 77779K 
> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
> *
> *
> *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs] 
> 1352217K->463460K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -433641480
> Max   Chunk Size: -433641480
> Number of Blocks: 1
> Av.  Block  Size: -433641480
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1227
> Max   Chunk Size: 631
> Number of Blocks: 3
> Av.  Block  Size: 409
> Tree      Height: 3
> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>
>
> Ashwin Jayaprakash.
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141217/f0c5547a/attachment.html>

From java at elyograg.org  Wed Dec 17 19:15:37 2014
From: java at elyograg.org (Shawn Heisey)
Date: Wed, 17 Dec 2014 12:15:37 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <1418831456.3255.22.camel@oracle.com>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com>
Message-ID: <5491D659.1090703@elyograg.org>

On 12/17/2014 8:50 AM, Thomas Schatzl wrote:
>   could you provide some logs to look at? It is impossible to give good
> recommendations without having at least some more detail about what's
> going on.
> 
> Preferably logs with at least the mentioned options they used to tune
> the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX:
> +PrintAdaptiveSizePolicy
> 
> It might also be a good idea to start with the options given in the
> cloudera blog entry:
> 
>   -XX:MaxGCPauseMillis=100        // the max pause time you want
>   -XX:+ParallelRefProcEnabled     // not sure, only if Solr uses lots of
> soft or weak references.
>   -XX:-ResizePLAB                 // that's minor
>   -XX:G1NewSizePercent=1          // that may help in achieving the
> pause time goal
>   -Xms<heap size>M
>   -Xmx<heap size>M
> 
> I do not think there is need to set the ParallelGCThreads according to
> that formula. This has been the default formula for calculating the
> number of threads for all collectors for a long time (but then again it
> might have changed sometime in jdk7).
> 
> You may also want to use a JDK 8 build, preferably (for me :) some 8u40
> EA build (e.g. from https://jdk8.java.net/download.html); there have
> been a lot of improvements to G1 in JDK8, and in particular 8u40.

Strange, I seem to have only received the copy of this message sent
directly to me, I never got the list copy.

Here's the options I'm using for G1 on 7u72:

JVM_OPTS=" \
-XX:+UseG1GC \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Here's the options I used for CMS on 7u25:

JVM_OPTS=" \
-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:CMSFullGCsBeforeCompaction=1 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=70 \
-XX:CMSTriggerPermRatio=80 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

In both cases, I used -Xms4096M and -Xmx6144M.  These are the GC logging
options:

GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps
-XX:+PrintGCDetails"

Here's the GC logs that I already have:

https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0
https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0

I believe that Lucene does use a lot of references.  I am more familiar
with Solr code than Lucene, but even on Solr, I am not well-versed in
the lower-level details.

I will get PrintAdaptiveSizePolicy added to my GC logging options.

Unless the performance improvement in Java 8 is significant, I don't
think I can make a compelling case to switch from Java 7 yet.

Although I have UseLargePages, I do not have any huge pages allocated in
the CentOS 6 operating system, so this is not actually doing anything.

Thanks,
Shawn


From thomas.schatzl at oracle.com  Wed Dec 17 20:51:53 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 17 Dec 2014 21:51:53 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <5491D659.1090703@elyograg.org>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org>
Message-ID: <1418849513.3293.3.camel@oracle.com>

Hi Shawn,

Shawn Heisey wrote: 
> On 12/17/2014 8:50 AM, Thomas Schatzl wrote:
> >   could you provide some logs to look at? It is impossible to give good
> > recommendations without having at least some more detail about what's
> > going on.
> > 
> > Preferably logs with at least the mentioned options they used to tune
> > the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX:
> > +PrintAdaptiveSizePolicy
> > 
> > It might also be a good idea to start with the options given in the
> > cloudera blog entry:
> > 
> >   -XX:MaxGCPauseMillis=100        // the max pause time you want
> >   -XX:+ParallelRefProcEnabled     // not sure, only if Solr uses lots of
> > soft or weak references.
> >   -XX:-ResizePLAB                 // that's minor
> >   -XX:G1NewSizePercent=1          // that may help in achieving the
> > pause time goal
> >   -Xms<heap size>M
> >   -Xmx<heap size>M
> > 
> > I do not think there is need to set the ParallelGCThreads according to
> > that formula. This has been the default formula for calculating the
> > number of threads for all collectors for a long time (but then again it
> > might have changed sometime in jdk7).
> > 
> > You may also want to use a JDK 8 build, preferably (for me :) some 8u40
> > EA build (e.g. from https://jdk8.java.net/download.html); there have
> > been a lot of improvements to G1 in JDK8, and in particular 8u40.
> 
> Strange, I seem to have only received the copy of this message sent
> directly to me, I never got the list copy.

Not sure why. One copy has been archived in the mailing list archives though...

> Here's the options I'm using for G1 on 7u72:
> 
> JVM_OPTS=" \
> -XX:+UseG1GC \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> Here's the options I used for CMS on 7u25:
> 
> JVM_OPTS=" \
> -XX:NewRatio=3 \
> -XX:SurvivorRatio=4 \
> -XX:TargetSurvivorRatio=90 \
> -XX:MaxTenuringThreshold=8 \
> -XX:+UseConcMarkSweepGC \
> -XX:+CMSScavengeBeforeRemark \
> -XX:PretenureSizeThreshold=64m \
> -XX:CMSFullGCsBeforeCompaction=1 \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=70 \
> -XX:CMSTriggerPermRatio=80 \
> -XX:CMSMaxAbortablePrecleanTime=6000 \
> -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> In both cases, I used -Xms4096M and -Xmx6144M.  These are the GC logging
> options:
> 
> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails"
> 
> Here's the GC logs that I already have:
> 
> https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0
> https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0
> 

  please also add -XX:+PrintReferenceGC, and definitely use -XX:
+ParallelRefProcEnabled.

GC is spending a significant amount of the time in soft/weak reference
processing. -XX:+ParallelRefProcEnabled will help, but there will be
spikes still. I saw that GC sometimes spends 1000ms just processing
those references; using 8 threads this should get better.

That alone will likely make it hard reaching a 100ms pause time goal
(1000ms/8 = 125ms...).

CMS has the same problems, and while on average it has ~215ms pauses,
there seem to be a lot that are a lot longer too. Reference processing
also takes very long, even with -XX:+ParallelRefProcEnabled.

I am not sure about the cause for the full gc's: either the pause time
prediction in G1 in that version is too bad and it tries to use a way
too large young gen, or there are a few very large objects around.

Depending on the log output and the impact of the other options we might
want to cap the maximum young gen size.

> I believe that Lucene does use a lot of references.

I saw that. Must be millions. -XX:+PrintReferenceGC should show that
(also in CMS).

>  I am more familiar
> with Solr code than Lucene, but even on Solr, I am not well-versed in
> the lower-level details.
> 
> I will get PrintAdaptiveSizePolicy added to my GC logging options.
> 
> Unless the performance improvement in Java 8 is significant, I don't
> think I can make a compelling case to switch from Java 7 yet.

>From the top of my head:

 - logging is better
 - parallelized a few more GC phases
 - class unloading after concurrent mark (not only during full gc) - but
that does not seem to be a problem
 - prediction fixes
 - much improved handling of large objects - does not seem to be a
problem here
 - slew of bugfixes

I am mostly missing the improved logging for analysis, and the
improvements in pause times.

> Although I have UseLargePages, I do not have any huge pages allocated in
> the CentOS 6 operating system, so this is not actually doing anything.

Thanks,
  Thomas


From ashwin.jayaprakash at gmail.com  Wed Dec 17 23:12:14 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Wed, 17 Dec 2014 15:12:14 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
Message-ID: <CAF9YjSDAJPKho7yFK_Hv7kbnz7HZs39zCnzgj=z4jQhC8o5Rpw@mail.gmail.com>

I've uploaded our latest GC log files to -
https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing

I've also summarized the top pause times by running "grep -oE "Times:
user=.*" gc.log.0 | sort -nr | head -25"

Times: user=7.89 sys=0.55, real=0.65 secs]
Times: user=7.71 sys=4.59, real=1.10 secs]
Times: user=7.46 sys=0.32, real=0.67 secs]
Times: user=6.55 sys=0.96, real=0.68 secs]
Times: user=6.40 sys=0.27, real=0.57 secs]
Times: user=6.27 sys=0.65, real=0.55 secs]
Times: user=6.24 sys=0.29, real=0.52 secs]
Times: user=5.25 sys=0.26, real=0.45 secs]
Times: user=4.95 sys=0.49, real=0.53 secs]
Times: user=4.90 sys=0.54, real=0.45 secs]
Times: user=4.55 sys=1.46, real=0.61 secs]
Times: user=4.39 sys=0.26, real=0.40 secs]
Times: user=3.61 sys=0.39, real=0.50 secs]
Times: user=3.59 sys=0.18, real=0.35 secs]
Times: user=3.16 sys=0.00, real=3.17 secs]
Times: user=3.06 sys=0.14, real=0.25 secs]
Times: user=3.05 sys=0.24, real=0.33 secs]
Times: user=3.03 sys=0.14, real=0.25 secs]
Times: user=2.97 sys=0.38, real=0.33 secs]
Times: user=2.77 sys=0.14, real=0.25 secs]
Times: user=2.51 sys=0.08, real=0.22 secs]
Times: user=2.49 sys=0.13, real=0.21 secs]
Times: user=2.25 sys=0.32, real=0.26 secs]
Times: user=2.06 sys=0.12, real=0.19 secs]
Times: user=2.06 sys=0.11, real=0.17 secs]

I wonder if we should enable "UseCondCardMark"?

Thanks.


On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash <
ashwin.jayaprakash at gmail.com> wrote:
>
> Hi, we have a cluster of ElasticSearch servers running with 31G heap and
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>
> While our old gen seems to be very stable with about 40% usage and no Full
> GCs so far, our young gen keeps growing from ~50MB to 850MB every few
> seconds. These ParNew collections are taking anywhere between 1-7 seconds
> and is causing some of our requests to time out. The eden space keeps
> filling up and then cleared every 30-60 seconds. There is definitely work
> being done by our JVM in terms of caching/buffering objects for a few
> seconds, writing to disk and then clearing the objects (typical
> Lucene/ElasticSearch indexing and querying workload)
>
> These long pauses are not good for our server throughput and I was doing
> some reading. I got some conflicting reports on how Cassandra is configured
> compared to Hadoop. There are also references
> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
> to this old ParNew+CMS bug
> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
> would've been addressed in the JRE version we are using. Cassandra
> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
> larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>
> Since most of our allocations seem to be quite short lived, is there a way
> to avoid these "long" young gen pauses?
>
> Thanks in advance. Here are some details.
>
> *Heap settings:*
> java -Xmx31000m -Xms31000m
> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
> -XX:+PrintGCDetails
>
>
> *Last few lines of "kill -3 pid" output:*
> Heap
>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
> 0x00007fa1d2190000, 0x00007fa1d2190000)
>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
> 0x00007fa1c4950000)
>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
> 0x00007fa1d2190000)
>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
> 0x00007fa1cb570000)
>  concurrent mark-sweep generation total 30636480K, used 12036523K
> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>  concurrent-mark-sweep perm gen total 128856K, used 77779K
> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>
> *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs]
> 1352217K->463460K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -433641480
> Max   Chunk Size: -433641480
> Number of Blocks: 1
> Av.  Block  Size: -433641480
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1227
> Max   Chunk Size: 631
> Number of Blocks: 3
> Av.  Block  Size: 409
> Tree      Height: 3
> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>
>
> Ashwin Jayaprakash.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141217/e3e012fc/attachment.html>

From gustav.r.akesson at gmail.com  Thu Dec 18 08:05:38 2014
From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=)
Date: Thu, 18 Dec 2014 09:05:38 +0100
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSDAJPKho7yFK_Hv7kbnz7HZs39zCnzgj=z4jQhC8o5Rpw@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<CAF9YjSDAJPKho7yFK_Hv7kbnz7HZs39zCnzgj=z4jQhC8o5Rpw@mail.gmail.com>
Message-ID: <CAKEw5+7wKmpGa0MKJ1tz1FxheqGczNzxoONdkQy_1PpEgujmYQ@mail.gmail.com>

Hi,

I see a significant increase in systime which (to my experience) usually is
because of either page swaps (some parts of the heap has to be paged in in
STW phase), or long latency for GC writing the logs to disc (which is a
synchronous operation as part of GC cycle).

When these multi-second YGCs occur, have you noticed an increase of page
swaps? What is the resident size of this particular Java process? Could you
try to write the GC logs to a RAM disc and see if the problem goes away?


Best Regards,
Gustav ?kesson

On Thu, Dec 18, 2014 at 12:12 AM, Ashwin Jayaprakash <
ashwin.jayaprakash at gmail.com> wrote:
>
> I've uploaded our latest GC log files to -
> https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing
>
> I've also summarized the top pause times by running "grep -oE "Times:
> user=.*" gc.log.0 | sort -nr | head -25"
>
> Times: user=7.89 sys=0.55, real=0.65 secs]
> Times: user=7.71 sys=4.59, real=1.10 secs]
> Times: user=7.46 sys=0.32, real=0.67 secs]
> Times: user=6.55 sys=0.96, real=0.68 secs]
> Times: user=6.40 sys=0.27, real=0.57 secs]
> Times: user=6.27 sys=0.65, real=0.55 secs]
> Times: user=6.24 sys=0.29, real=0.52 secs]
> Times: user=5.25 sys=0.26, real=0.45 secs]
> Times: user=4.95 sys=0.49, real=0.53 secs]
> Times: user=4.90 sys=0.54, real=0.45 secs]
> Times: user=4.55 sys=1.46, real=0.61 secs]
> Times: user=4.39 sys=0.26, real=0.40 secs]
> Times: user=3.61 sys=0.39, real=0.50 secs]
> Times: user=3.59 sys=0.18, real=0.35 secs]
> Times: user=3.16 sys=0.00, real=3.17 secs]
> Times: user=3.06 sys=0.14, real=0.25 secs]
> Times: user=3.05 sys=0.24, real=0.33 secs]
> Times: user=3.03 sys=0.14, real=0.25 secs]
> Times: user=2.97 sys=0.38, real=0.33 secs]
> Times: user=2.77 sys=0.14, real=0.25 secs]
> Times: user=2.51 sys=0.08, real=0.22 secs]
> Times: user=2.49 sys=0.13, real=0.21 secs]
> Times: user=2.25 sys=0.32, real=0.26 secs]
> Times: user=2.06 sys=0.12, real=0.19 secs]
> Times: user=2.06 sys=0.11, real=0.17 secs]
>
> I wonder if we should enable "UseCondCardMark"?
>
> Thanks.
>
>
>
>
>
>
>
> On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>>
>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and
>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>
>> While our old gen seems to be very stable with about 40% usage and no
>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>> and is causing some of our requests to time out. The eden space keeps
>> filling up and then cleared every 30-60 seconds. There is definitely work
>> being done by our JVM in terms of caching/buffering objects for a few
>> seconds, writing to disk and then clearing the objects (typical
>> Lucene/ElasticSearch indexing and querying workload)
>>
>> These long pauses are not good for our server throughput and I was doing
>> some reading. I got some conflicting reports on how Cassandra is configured
>> compared to Hadoop. There are also references
>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>> to this old ParNew+CMS bug
>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>> would've been addressed in the JRE version we are using. Cassandra
>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>
>> Since most of our allocations seem to be quite short lived, is there a
>> way to avoid these "long" young gen pauses?
>>
>> Thanks in advance. Here are some details.
>>
>> *Heap settings:*
>> java -Xmx31000m -Xms31000m
>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=70
>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
>> -XX:+PrintGCDetails
>>
>>
>> *Last few lines of "kill -3 pid" output:*
>> Heap
>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>> 0x00007fa1c4950000)
>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>> 0x00007fa1d2190000)
>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>> 0x00007fa1cb570000)
>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>
>> *Sample gc log:*
>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:    2956312 bytes,    2956312 total
>> - age   2:     591800 bytes,    3548112 total
>> - age   3:      66216 bytes,    3614328 total
>> - age   4:     270752 bytes,    3885080 total
>> - age   5:     615472 bytes,    4500552 total
>> - age   6:     358440 bytes,    4858992 total
>> : 900635K->8173K(996800K), 0.0317340 secs]
>> 1352217K->463460K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -433641480
>> Max   Chunk Size: -433641480
>> Number of Blocks: 1
>> Av.  Block  Size: -433641480
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1227
>> Max   Chunk Size: 631
>> Number of Blocks: 3
>> Av.  Block  Size: 409
>> Tree      Height: 3
>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>
>>
>> Ashwin Jayaprakash.
>>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/168c75ed/attachment-0001.html>

From charlie.hunt at oracle.com  Thu Dec 18 12:45:52 2014
From: charlie.hunt at oracle.com (charlie hunt)
Date: Thu, 18 Dec 2014 06:45:52 -0600
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAKEw5+7wKmpGa0MKJ1tz1FxheqGczNzxoONdkQy_1PpEgujmYQ@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<CAF9YjSDAJPKho7yFK_Hv7kbnz7HZs39zCnzgj=z4jQhC8o5Rpw@mail.gmail.com>
	<CAKEw5+7wKmpGa0MKJ1tz1FxheqGczNzxoONdkQy_1PpEgujmYQ@mail.gmail.com>
Message-ID: <5492CC80.5010200@oracle.com>

An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/02086adc/attachment.html>

From ashwin.jayaprakash at gmail.com  Thu Dec 18 20:00:03 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Thu, 18 Dec 2014 12:00:03 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <5492272A.2040304@oracle.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
Message-ID: <CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>

*@Jon*, thanks for clearing that up. Yes, that was my source of confusion.
I was misinterpreting the user time with the real time.

*Jon's reply from an offline conversation:*

> Are these the 7 second collections you refer to in the paragraph above?
> If yes, the "user" time is the sum of the time spent by multiple GC
> threads.
> The real time is the GC pause time that your application experiences.
> In the above case the GC pauses are .65s, 1.10s and .67s.
>

Something that added to my confusion was the tools we are using in-house.
In addition to the GC logs we have 1 tool that uses the
GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
match the values I see in the GC logs (
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
).

The other tool is the ElasticSearch JVM stats logger which uses
GarbageCollectorMXBean's
LastGCInfo (
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
and
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
).

Do these methods expose the total time spent by all the parallel GC threads
for the ParNew pool or the "real" time? They do not seem to match the GC
log times.

*@Gustav* We do not have any swapping on the machines. It could be the disk
IO experienced by the GC log writer itself, as you've suspected. The
machine has 128G of RAM

*"top" sample from a similar machine:*
   PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
2408:05 java


*"free -g":*
             total       used       free     shared    buffers     cached
Mem:           120        119          0          0          0         95
-/+ buffers/cache:         23         96
Swap:            0          0          0

*@Charlie* Hugepages has already been disabled

*sudo sysctl -a | grep hugepage*
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0

*cat /sys/kernel/mm/transparent_hugepage/enabled*
[always] madvise never


Thanks all!


On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
wrote:
>
>  Ashwin,
>
> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>
>  Hi, we have a cluster of ElasticSearch servers running with 31G heap and
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>
> While our old gen seems to be very stable with about 40% usage and no Full
> GCs so far, our young gen keeps growing from ~50MB to 850MB every few
> seconds. These ParNew collections are taking anywhere between 1-7 seconds
> and is causing some of our requests to time out. The eden space keeps
> filling up and then cleared every 30-60 seconds. There is definitely work
> being done by our JVM in terms of caching/buffering objects for a few
> seconds, writing to disk and then clearing the objects (typical
> Lucene/ElasticSearch indexing and querying workload)
>
>
> From you recent mail
>
> Times: user=7.89 sys=0.55, real=0.65 secs]
> Times: user=7.71 sys=4.59, real=1.10 secs]
> Times: user=7.46 sys=0.32, real=0.67 secs]
>
> Are these the 7 second collections you refer to in the paragraph above?
> If yes, the "user" time is the sum of the time spent by multiple GC
> threads.
> The real time is the GC pause time that your application experiences.
> In the above case the GC pauses are .65s, 1.10s and .67s.
>
> Comment below  regarding  "eden space keeps filling up".
>
>
>  These long pauses are not good for our server throughput and I was doing
> some reading. I got some conflicting reports on how Cassandra is configured
> compared to Hadoop. There are also references
> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
> to this old ParNew+CMS bug
> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
> would've been addressed in the JRE version we are using. Cassandra
> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
> larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>
>  Since most of our allocations seem to be quite short lived, is there a
> way to avoid these "long" young gen pauses?
>
>  Thanks in advance. Here are some details.
>
> * Heap settings:*
> java -Xmx31000m -Xms31000m
> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
> -XX:+PrintGCDetails
>
>
> * Last few lines of "kill -3 pid" output:*
> Heap
>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
> 0x00007fa1d2190000, 0x00007fa1d2190000)
>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
> 0x00007fa1c4950000)
>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
> 0x00007fa1d2190000)
>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
> 0x00007fa1cb570000)
>  concurrent mark-sweep generation total 30636480K, used 12036523K
> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>  concurrent-mark-sweep perm gen total 128856K, used 77779K
> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>
>  *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs]
> 1352217K->463460K(31633280K)After GC:
>
>
> In this GC eden is at 900635k before the GC and is a 8173k after.  That GC
> fills up is
> the expected behavior.  Is that what you were asking about above?  If not
> can you
> send me an example of the "fills up" behavior.
>
> Jon
>
>   Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -433641480
> Max   Chunk Size: -433641480
> Number of Blocks: 1
> Av.  Block  Size: -433641480
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1227
> Max   Chunk Size: 631
> Number of Blocks: 3
> Av.  Block  Size: 409
> Tree      Height: 3
> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>
>
>  Ashwin Jayaprakash.
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/1466985d/attachment.html>

From holger.hoffstaette at googlemail.com  Thu Dec 18 20:17:27 2014
From: holger.hoffstaette at googlemail.com (=?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?=)
Date: Thu, 18 Dec 2014 21:17:27 +0100
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
Message-ID: <54933657.3010505@googlemail.com>

On 12/18/14 21:00, Ashwin Jayaprakash wrote:
> *cat /sys/kernel/mm/transparent_hugepage/enabled*
> [always] madvise never

That means THP aka khugepaged is still *enabled* and will still interfere. Whether it actually does can be seen in e.g. cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed

-h


From jon.masamitsu at oracle.com  Thu Dec 18 20:23:56 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 18 Dec 2014 12:23:56 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
Message-ID: <549337DC.2070300@oracle.com>


On 12/18/2014 12:00 PM, Ashwin Jayaprakash wrote:
> *@Jon*, thanks for clearing that up. Yes, that was my source of 
> confusion. I was misinterpreting the user time with the real time.
>
> *Jon's reply from an offline conversation:*
>
>     Are these the 7 second collections you refer to in the paragraph
>     above?
>     If yes, the "user" time is the sum of the time spent by multiple
>     GC threads.
>     The real time is the GC pause time that your application experiences.
>     In the above case the GC pauses are .65s, 1.10s and .67s.
>
>
> Something that added to my confusion was the tools we are using 
> in-house. In addition to the GC logs we have 1 tool that uses the 
> GarbageCollectorMXBean's getCollectionTime() method. This does not 
> seem to match the values I see in the GC logs 
> (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29).
>
> The other tool is the ElasticSearch JVM stats logger which uses 
> GarbageCollectorMXBean's LastGCInfo 
> (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 
> and 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187).
>
> Do these methods expose the total time spent by all the parallel GC 
> threads for the ParNew pool or the "real" time? They do not seem to 
> match the GC log times.

I haven't found the JVM code that provides information for 
getCollectionTime() but
I would expect to match the pause times in the GC logs.  From your 
earlier mail

> *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs] 
> 1352217K->463460K(31633280K)After GC:

The time 0.0317340 is what the GC measures and is what I would expect to 
be available
through the MXBeans.

You're saying that does not match?

Jon

> *@Gustav* We do not have any swapping on the machines. It could be the 
> disk IO experienced by the GC log writer itself, as you've suspected. 
> The machine has 128G of RAM
>
> *"top" sample from a similar machine:*
>    PID    USER PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4 21.1        
> 2408:05 java
>
> *"free -g":
> *
> total       used       free     shared    buffers cached
> Mem:           120        119          0 0          0         95
> -/+ buffers/cache:         23         96
> Swap:            0          0          0
>
> *@Charlie* Hugepages has already been disabled
>
> *sudo sysctl -a | grep hugepage*
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 0
>
> *cat /sys/kernel/mm/transparent_hugepage/enabled*
> [always] madvise never
>
>
> Thanks all!
>
>
>
> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>
>     Ashwin,
>
>     On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>     Hi, we have a cluster of ElasticSearch servers running with 31G
>>     heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>
>>     While our old gen seems to be very stable with about 40% usage
>>     and no Full GCs so far, our young gen keeps growing from ~50MB to
>>     850MB every few seconds. These ParNew collections are taking
>>     anywhere between 1-7 seconds and is causing some of our requests
>>     to time out. The eden space keeps filling up and then cleared
>>     every 30-60 seconds. There is definitely work being done by our
>>     JVM in terms of caching/buffering objects for a few seconds,
>>     writing to disk and then clearing the objects (typical
>>     Lucene/ElasticSearch indexing and querying workload)
>
>     From you recent mail
>
>     Times: user=7.89 sys=0.55, real=0.65 secs]
>     Times: user=7.71 sys=4.59, real=1.10 secs]
>     Times: user=7.46 sys=0.32, real=0.67 secs]
>
>     Are these the 7 second collections you refer to in the paragraph
>     above?
>     If yes, the "user" time is the sum of the time spent by multiple
>     GC threads.
>     The real time is the GC pause time that your application experiences.
>     In the above case the GC pauses are .65s, 1.10s and .67s.
>
>     Comment below  regarding  "eden space keeps filling up".
>
>>
>>     These long pauses are not good for our server throughput and I
>>     was doing some reading. I got some conflicting reports on how
>>     Cassandra is configured compared to Hadoop. There are also
>>     references
>>     <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>     to this old ParNew+CMS bug
>>     <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>     would've been addressed in the JRE version we are using.
>>     Cassandra recommends
>>     <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
>>     larger NewSize with just 1 for max tenuring, whereas Hadoop
>>     recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a
>>     small NewSize.
>>
>>     Since most of our allocations seem to be quite short lived, is
>>     there a way to avoid these "long" young gen pauses?
>>
>>     Thanks in advance. Here are some details.
>>     *
>>     Heap settings:*
>>     java -Xmx31000m -Xms31000m
>>     -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>     -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>     -XX:CMSInitiatingOccupancyFraction=70
>>     -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC
>>     -XX:+PrintPromotionFailure
>>     -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>     -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2
>>     -XX:+PrintGCDateStamps
>>     -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>     -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>
>>     *
>>     Last few lines of "kill -3 pid" output:*
>>     Heap
>>      par new generation   total 996800K, used 865818K
>>     [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>       eden space 886080K,  94% used [0x00007fa18e800000,
>>     0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>       from space 110720K,  25% used [0x00007fa1cb570000,
>>     0x00007fa1cd091078, 0x00007fa1d2190000)
>>       to   space 110720K,   0% used [0x00007fa1c4950000,
>>     0x00007fa1c4950000, 0x00007fa1cb570000)
>>      concurrent mark-sweep generation total 30636480K, used 12036523K
>>     [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>      concurrent-mark-sweep perm gen total 128856K, used 77779K
>>     [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>     *
>>     *
>>     *Sample gc log:*
>>     2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>     Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>     - age   1:    2956312 bytes,    2956312 total
>>     - age   2:     591800 bytes,    3548112 total
>>     - age   3:      66216 bytes,    3614328 total
>>     - age   4:     270752 bytes,    3885080 total
>>     - age   5:     615472 bytes,    4500552 total
>>     - age   6:     358440 bytes,    4858992 total
>>     : 900635K->8173K(996800K), 0.0317340 secs]
>>     1352217K->463460K(31633280K)After GC:
>
>     In this GC eden is at 900635k before the GC and is a 8173k after. 
>     That GC fills up is
>     the expected behavior.  Is that what you were asking about above? 
>     If not can you
>     send me an example of the "fills up" behavior.
>
>     Jon
>
>>     Statistics for BinaryTreeDictionary:
>>     ------------------------------------
>>     Total Free Space: -433641480
>>     Max   Chunk Size: -433641480
>>     Number of Blocks: 1
>>     Av.  Block  Size: -433641480
>>     Tree      Height: 1
>>     After GC:
>>     Statistics for BinaryTreeDictionary:
>>     ------------------------------------
>>     Total Free Space: 1227
>>     Max   Chunk Size: 631
>>     Number of Blocks: 3
>>     Av.  Block  Size: 409
>>     Tree      Height: 3
>>     , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>
>>
>>     Ashwin Jayaprakash.
>>
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/d426f2b2/attachment-0001.html>

From charlie.hunt at oracle.com  Thu Dec 18 21:10:41 2014
From: charlie.hunt at oracle.com (charlie hunt)
Date: Thu, 18 Dec 2014 15:10:41 -0600
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
Message-ID: <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>

The output:
> cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never

Tells me that transparent huge pages are enabled ?always?.

I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any.

charlie


> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com> wrote:
> 
> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time.
> 
> Jon's reply from an offline conversation:
> Are these the 7 second collections you refer to in the paragraph above?
> If yes, the "user" time is the sum of the time spent by multiple GC threads.
> The real time is the GC pause time that your application experiences.
> In the above case the GC pauses are .65s, 1.10s and .67s.
> 
> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29>).
> 
> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194> and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187>).
> 
> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times.
> 
> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM
> 
> "top" sample from a similar machine:
>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1        2408:05 java
> 
> "free -g":
>              total       used       free     shared    buffers     cached
> Mem:           120        119          0          0          0         95
> -/+ buffers/cache:         23         96
> Swap:            0          0          0
> 
> @Charlie Hugepages has already been disabled
> 
> sudo sysctl -a | grep hugepage
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 0
> 
> cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
> 
> 
> Thanks all!
> 
> 
> 
> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
> Ashwin,
> 
> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). 
>> 
>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload)
> 
> From you recent mail
> 
> Times: user=7.89 sys=0.55, real=0.65 secs]
> Times: user=7.71 sys=4.59, real=1.10 secs]
> Times: user=7.46 sys=0.32, real=0.67 secs]
> 
> Are these the 7 second collections you refer to in the paragraph above?
> If yes, the "user" time is the sum of the time spent by multiple GC threads.
> The real time is the GC pause time that your application experiences.
> In the above case the GC pauses are .65s, 1.10s and .67s.
> 
> Comment below  regarding  "eden space keeps filling up".
> 
>> 
>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/> to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought would've been addressed in the JRE version we are using. Cassandra recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>> 
>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses?
>> 
>> Thanks in advance. Here are some details.
>> 
>> Heap settings:
>> java -Xmx31000m -Xms31000m 
>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m 
>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 
>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure 
>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution 
>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps 
>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails 
>> 
>> 
>> Last few lines of "kill -3 pid" output:
>> Heap
>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000)
>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000)
>>  concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>  concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>> 
>> Sample gc log:
>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:    2956312 bytes,    2956312 total
>> - age   2:     591800 bytes,    3548112 total
>> - age   3:      66216 bytes,    3614328 total
>> - age   4:     270752 bytes,    3885080 total
>> - age   5:     615472 bytes,    4500552 total
>> - age   6:     358440 bytes,    4858992 total
>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC:
> 
> In this GC eden is at 900635k before the GC and is a 8173k after.  That GC fills up is
> the expected behavior.  Is that what you were asking about above?  If not can you
> send me an example of the "fills up" behavior.
> 
> Jon
> 
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -433641480
>> Max   Chunk Size: -433641480
>> Number of Blocks: 1
>> Av.  Block  Size: -433641480
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1227
>> Max   Chunk Size: 631
>> Number of Blocks: 3
>> Av.  Block  Size: 409
>> Tree      Height: 3
>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] 
>> 
>> 
>> Ashwin Jayaprakash.
>> 
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/ced27e0a/attachment.html>

From ashwin.jayaprakash at gmail.com  Thu Dec 18 22:41:02 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Thu, 18 Dec 2014 14:41:02 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
Message-ID: <CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>

*@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
"never" and thought it was already done. In fact "cat
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904
and after an hour now, it shows 6845 on one of our machines.

*@Jon* I dug through some of our ElasticSearch/application logs again and
tried to correlate them with the GC logs. The collection time does seem to
match the GC log's "real" time. However the collected sizes don't seem to
match, which is what threw me off.

*Item 1:*

2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
Desired survivor size 56688640 bytes, new threshold 6 (max 6)
- age   1:   31568024 bytes,   31568024 total
- age   2:    1188576 bytes,   32756600 total
- age   3:    1830920 bytes,   34587520 total
- age   4:     282536 bytes,   34870056 total
- age   5:     316640 bytes,   35186696 total
- age   6:     249856 bytes,   35436552 total
: 931773K->49827K(996800K), 1.3622770 secs]
22132844K->21256042K(31633280K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1206815932
Max   Chunk Size: 1206815932
Number of Blocks: 1
Av.  Block  Size: 1206815932
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6189459
Max   Chunk Size: 6188544
Number of Blocks: 3
Av.  Block  Size: 2063153
Tree      Height: 2
, 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application
threads were stopped: 1.3638970 seconds


[2014-12-18T21:34:57,203Z]  [WARN ]
[elasticsearch[server00001][scheduler][T#1]]
[org.elasticsearch.monitor.jvm]  [server00001]
[gc][young][163563][20423] duration
[1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
[20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
[543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
[20.2gb]->[20.2gb]/[29.2gb]}


*Item 2:*

2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
Desired survivor size 56688640 bytes, new threshold 6 (max 6)
- age   1:   32445776 bytes,   32445776 total
- age   3:    6068000 bytes,   38513776 total
- age   4:    1031528 bytes,   39545304 total
- age   5:     255896 bytes,   39801200 total
: 939702K->53536K(996800K), 2.9352940 secs]
21501296K->20625925K(31633280K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1287922158
Max   Chunk Size: 1287922158
Number of Blocks: 1
Av.  Block  Size: 1287922158
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6205476
Max   Chunk Size: 6204928
Number of Blocks: 2
Av.  Block  Size: 3102738
Tree      Height: 2
, 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application
threads were stopped: 2.9367640 seconds


[2014-12-18T20:53:37,950Z]  [WARN ]
[elasticsearch[server00001][scheduler][T#1]]
[org.elasticsearch.monitor.jvm]  [server00001]
[gc][young][161091][19838] duration
[2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
[20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
[801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
[52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}


*Item 3:*

2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: -966776244
Max   Chunk Size: -966776244
Number of Blocks: 1
Av.  Block  Size: -966776244
Tree      Height: 1
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 530
Max   Chunk Size: 268
Number of Blocks: 2
Av.  Block  Size: 265
Tree      Height: 2
2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
Desired survivor size 56688640 bytes, new threshold 1 (max 6)
- age   1:  113315624 bytes,  113315624 total
: 996800K->110720K(996800K), 7.3511710 secs]
5609422K->5065102K(31633280K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: -1009955715
Max   Chunk Size: -1009955715
Number of Blocks: 1
Av.  Block  Size: -1009955715
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 530
Max   Chunk Size: 268
Number of Blocks: 2
Av.  Block  Size: 265
Tree      Height: 2
, 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application
threads were stopped: 7.3525250 seconds


[2014-12-17T14:42:17,944Z] [WARN ]
[elasticsearch[prdaes04data03][scheduler][T#1]]
[org.elasticsearch.monitor.jvm] [prdaes04data03]
[gc][young][52582][5110] duration
[7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
[5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
[695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
[108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}


On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:
>
> The output:
>
> *cat /sys/kernel/mm/transparent_hugepage/enabled*
> [always] madvise never
>
>
> Tells me that transparent huge pages are enabled ?always?.
>
> I think I would change this to ?never?, even though sysctl -a may be
> reporting no huge pages are currently in use. The system may trying to
> coalesce pages occasionally in attempt to make huge pages available, even
> though you are not currently using any.
>
> charlie
>
>
> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>
> *@Jon*, thanks for clearing that up. Yes, that was my source of
> confusion. I was misinterpreting the user time with the real time.
>
> *Jon's reply from an offline conversation:*
>
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC
>> threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>
>
> Something that added to my confusion was the tools we are using in-house.
> In addition to the GC logs we have 1 tool that uses the
> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
> match the values I see in the GC logs (
> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
> ).
>
> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
> LastGCInfo (
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
> and
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
> ).
>
> Do these methods expose the total time spent by all the parallel GC
> threads for the ParNew pool or the "real" time? They do not seem to match
> the GC log times.
>
> *@Gustav* We do not have any swapping on the machines. It could be the
> disk IO experienced by the GC log writer itself, as you've suspected. The
> machine has 128G of RAM
>
> *"top" sample from a similar machine:*
>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
> 2408:05 java
>
>
> *"free -g":*
>              total       used       free     shared    buffers     cached
> Mem:           120        119          0          0          0         95
> -/+ buffers/cache:         23         96
> Swap:            0          0          0
>
> *@Charlie* Hugepages has already been disabled
>
> *sudo sysctl -a | grep hugepage*
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 0
>
> *cat /sys/kernel/mm/transparent_hugepage/enabled*
> [always] madvise never
>
>
> Thanks all!
>
>
>
> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
> wrote:
>>
>>  Ashwin,
>>
>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>
>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>
>> While our old gen seems to be very stable with about 40% usage and no
>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>> and is causing some of our requests to time out. The eden space keeps
>> filling up and then cleared every 30-60 seconds. There is definitely work
>> being done by our JVM in terms of caching/buffering objects for a few
>> seconds, writing to disk and then clearing the objects (typical
>> Lucene/ElasticSearch indexing and querying workload)
>>
>>
>> From you recent mail
>>
>> Times: user=7.89 sys=0.55, real=0.65 secs]
>> Times: user=7.71 sys=4.59, real=1.10 secs]
>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC
>> threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>
>> Comment below  regarding  "eden space keeps filling up".
>>
>>
>>  These long pauses are not good for our server throughput and I was doing
>> some reading. I got some conflicting reports on how Cassandra is configured
>> compared to Hadoop. There are also references
>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>> to this old ParNew+CMS bug
>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>> would've been addressed in the JRE version we are using. Cassandra
>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>
>>  Since most of our allocations seem to be quite short lived, is there a
>> way to avoid these "long" young gen pauses?
>>
>>  Thanks in advance. Here are some details.
>>
>> * Heap settings:*
>> java -Xmx31000m -Xms31000m
>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=70
>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
>> -XX:+PrintGCDetails
>>
>>
>> * Last few lines of "kill -3 pid" output:*
>> Heap
>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>> 0x00007fa1c4950000)
>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>> 0x00007fa1d2190000)
>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>> 0x00007fa1cb570000)
>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>
>>  *Sample gc log:*
>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:    2956312 bytes,    2956312 total
>> - age   2:     591800 bytes,    3548112 total
>> - age   3:      66216 bytes,    3614328 total
>> - age   4:     270752 bytes,    3885080 total
>> - age   5:     615472 bytes,    4500552 total
>> - age   6:     358440 bytes,    4858992 total
>> : 900635K->8173K(996800K), 0.0317340 secs]
>> 1352217K->463460K(31633280K)After GC:
>>
>>
>> In this GC eden is at 900635k before the GC and is a 8173k after.  That
>> GC fills up is
>> the expected behavior.  Is that what you were asking about above?  If not
>> can you
>> send me an example of the "fills up" behavior.
>>
>> Jon
>>
>>   Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -433641480
>> Max   Chunk Size: -433641480
>> Number of Blocks: 1
>> Av.  Block  Size: -433641480
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1227
>> Max   Chunk Size: 631
>> Number of Blocks: 3
>> Av.  Block  Size: 409
>> Tree      Height: 3
>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>
>>
>>  Ashwin Jayaprakash.
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>  _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/6064128f/attachment-0001.html>

From charlie.hunt at oracle.com  Fri Dec 19 22:10:24 2014
From: charlie.hunt at oracle.com (charlie hunt)
Date: Fri, 19 Dec 2014 16:10:24 -0600
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
	<CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
Message-ID: <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>

Disabling transparent huge pages should help those GC pauses where you are seeing high sys time reported, which should also shorten their pause times.

Thanks for also sharing your observation of khugepaged/pages_collapsed.

charlie

> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com> wrote:
> 
> @Charlie/@Holger My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines.
> 
> @Jon I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off.
> 
> Item 1:
> 
> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   31568024 bytes,   31568024 total
> - age   2:    1188576 bytes,   32756600 total
> - age   3:    1830920 bytes,   34587520 total
> - age   4:     282536 bytes,   34870056 total
> - age   5:     316640 bytes,   35186696 total
> - age   6:     249856 bytes,   35436552 total
> : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1206815932
> Max   Chunk Size: 1206815932
> Number of Blocks: 1
> Av.  Block  Size: 1206815932
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6189459
> Max   Chunk Size: 6188544
> Number of Blocks: 3
> Av.  Block  Size: 2063153
> Tree      Height: 2
> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds
> 
> 
> [2014-12-18T21:34:57,203Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]}
> 
> 
> Item 2:
> 
> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   32445776 bytes,   32445776 total
> - age   3:    6068000 bytes,   38513776 total
> - age   4:    1031528 bytes,   39545304 total
> - age   5:     255896 bytes,   39801200 total
> : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1287922158
> Max   Chunk Size: 1287922158
> Number of Blocks: 1
> Av.  Block  Size: 1287922158
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6205476
> Max   Chunk Size: 6204928
> Number of Blocks: 2
> Av.  Block  Size: 3102738
> Tree      Height: 2
> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds
> 
> 
> [2014-12-18T20:53:37,950Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
> 
> 
> Item 3:
> 
> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -966776244
> Max   Chunk Size: -966776244
> Number of Blocks: 1
> Av.  Block  Size: -966776244
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
> - age   1:  113315624 bytes,  113315624 total
> : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -1009955715
> Max   Chunk Size: -1009955715
> Number of Blocks: 1
> Av.  Block  Size: -1009955715
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds
> 
> 
> [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
> 
> 
> 
> 
> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
> The output:
>> cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
> 
> Tells me that transparent huge pages are enabled ?always?.
> 
> I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any.
> 
> charlie
> 
> 
>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com <mailto:ashwin.jayaprakash at gmail.com>> wrote:
>> 
>> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time.
>> 
>> Jon's reply from an offline conversation:
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>> 
>> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29>).
>> 
>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194> and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187>).
>> 
>> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times.
>> 
>> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM
>> 
>> "top" sample from a similar machine:
>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1        2408:05 java
>> 
>> "free -g":
>>              total       used       free     shared    buffers     cached
>> Mem:           120        119          0          0          0         95
>> -/+ buffers/cache:         23         96
>> Swap:            0          0          0
>> 
>> @Charlie Hugepages has already been disabled
>> 
>> sudo sysctl -a | grep hugepage
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>> 
>> cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>> 
>> 
>> Thanks all!
>> 
>> 
>> 
>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>> Ashwin,
>> 
>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). 
>>> 
>>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload)
>> 
>> From you recent mail
>> 
>> Times: user=7.89 sys=0.55, real=0.65 secs]
>> Times: user=7.71 sys=4.59, real=1.10 secs]
>> Times: user=7.46 sys=0.32, real=0.67 secs]
>> 
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>> 
>> Comment below  regarding  "eden space keeps filling up".
>> 
>>> 
>>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/> to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought would've been addressed in the JRE version we are using. Cassandra recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>> 
>>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses?
>>> 
>>> Thanks in advance. Here are some details.
>>> 
>>> Heap settings:
>>> java -Xmx31000m -Xms31000m 
>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m 
>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 
>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure 
>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution 
>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps 
>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails 
>>> 
>>> 
>>> Last few lines of "kill -3 pid" output:
>>> Heap
>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>  concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>> 
>>> Sample gc log:
>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:    2956312 bytes,    2956312 total
>>> - age   2:     591800 bytes,    3548112 total
>>> - age   3:      66216 bytes,    3614328 total
>>> - age   4:     270752 bytes,    3885080 total
>>> - age   5:     615472 bytes,    4500552 total
>>> - age   6:     358440 bytes,    4858992 total
>>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC:
>> 
>> In this GC eden is at 900635k before the GC and is a 8173k after.  That GC fills up is
>> the expected behavior.  Is that what you were asking about above?  If not can you
>> send me an example of the "fills up" behavior.
>> 
>> Jon
>> 
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -433641480
>>> Max   Chunk Size: -433641480
>>> Number of Blocks: 1
>>> Av.  Block  Size: -433641480
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1227
>>> Max   Chunk Size: 631
>>> Number of Blocks: 3
>>> Av.  Block  Size: 409
>>> Tree      Height: 3
>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] 
>>> 
>>> 
>>> Ashwin Jayaprakash.
>>> 
>>> 
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141219/f5a4042a/attachment.html>

From java at elyograg.org  Sat Dec 20 01:28:55 2014
From: java at elyograg.org (Shawn Heisey)
Date: Fri, 19 Dec 2014 18:28:55 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <1418849513.3293.3.camel@oracle.com>
References: <54906C34.8080408@elyograg.org>	
	<1418831456.3255.22.camel@oracle.com>
	<5491D659.1090703@elyograg.org>
	<1418849513.3293.3.camel@oracle.com>
Message-ID: <5494D0D7.2010606@elyograg.org>

On 12/17/2014 1:51 PM, Thomas Schatzl wrote:
>> In both cases, I used -Xms4096M and -Xmx6144M.  These are the GC logging
>> options:
>>
>> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails"
>>
>> Here's the GC logs that I already have:
>>
>> https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0
>> https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0
>>
> 
>   please also add -XX:+PrintReferenceGC, and definitely use -XX:
> +ParallelRefProcEnabled.
> 
> GC is spending a significant amount of the time in soft/weak reference
> processing. -XX:+ParallelRefProcEnabled will help, but there will be
> spikes still. I saw that GC sometimes spends 1000ms just processing
> those references; using 8 threads this should get better.
> 
> That alone will likely make it hard reaching a 100ms pause time goal
> (1000ms/8 = 125ms...).
> 
> CMS has the same problems, and while on average it has ~215ms pauses,
> there seem to be a lot that are a lot longer too. Reference processing
> also takes very long, even with -XX:+ParallelRefProcEnabled.
> 
> I am not sure about the cause for the full gc's: either the pause time
> prediction in G1 in that version is too bad and it tries to use a way
> too large young gen, or there are a few very large objects around.
> 
> Depending on the log output and the impact of the other options we might
> want to cap the maximum young gen size.
> 
>> I believe that Lucene does use a lot of references.
> 
> I saw that. Must be millions. -XX:+PrintReferenceGC should show that
> (also in CMS).

I still did not get the list message, but I figured out why.  The list
subscription has an option "Avoid duplicate copies of messages" that I
just had to turn off.  I prefer to reply to messages from the list
because I know for sure that all the right headers are included.

I would not be surprised if there are millions of references.  My whole
index is over 98 million documents and half of those documents are
present in shards on each server, taking up about 60GB of disk space per
server.

I already have ParallelRefProcEnabled and I have just added
PrintReferenceGC.

For reference, here are my options for CMS:

JVM_OPTS=" \
-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:CMSFullGCsBeforeCompaction=1 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=70 \
-XX:CMSTriggerPermRatio=80 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Which of these options will apply to G1, and are any of them worthwhile
to include?  I haven't got any tuning options at all for G1, and I'm
looking for suggestions.  This is my current G1 option list:

JVM_OPTS=" \
-XX:+UseG1GC \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Based on some recent list activity unrelated to this discussion, I also
opted to disable transparent huge pages on the Solr servers.  I haven't
noticed any real difference in the server resource graphs (CPU, load, etc).

I've started an internal discussion about Java 8 to see how receptive
everyone will be to an upgrade.

Thanks,
Shawn


From thomas.schatzl at oracle.com  Sun Dec 21 14:01:17 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Sun, 21 Dec 2014 15:01:17 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <5494D0D7.2010606@elyograg.org>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org>
	<1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org>
Message-ID: <1419170477.6868.1.camel@oracle.com>

Hi Shawn,

On Fri, 2014-12-19 at 18:28 -0700, Shawn Heisey wrote:
> On 12/17/2014 1:51 PM, Thomas Schatzl wrote:
> >> In both cases, I used -Xms4096M and -Xmx6144M.  These are the GC logging
> >> options:
> >>
> >> I believe that Lucene does use a lot of references.
> > 
> > I saw that. Must be millions. -XX:+PrintReferenceGC should show that
> > (also in CMS).
>
> I would not be surprised if there are millions of references.  My whole
> index is over 98 million documents and half of those documents are
> present in shards on each server, taking up about 60GB of disk space per
> server.
> 
> I already have ParallelRefProcEnabled and I have just added
> PrintReferenceGC.
> 
> For reference, here are my options for CMS:
> 
> JVM_OPTS=" \
> -XX:NewRatio=3 \
> -XX:SurvivorRatio=4 \
> -XX:TargetSurvivorRatio=90 \
> -XX:MaxTenuringThreshold=8 \
>-XX:+UseConcMarkSweepGC \
> -XX:+CMSScavengeBeforeRemark \
> -XX:PretenureSizeThreshold=64m \
> -XX:CMSFullGCsBeforeCompaction=1 \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=70 \
> -XX:CMSTriggerPermRatio=80 \
> -XX:CMSMaxAbortablePrecleanTime=6000 \
> -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> Which of these options will apply to G1, and are any of them worthwhile
> to include?

Only ParallelRefProcEnabled will be useful. Potentially UseLargePages
too, but you mentioned you do not have any of them configured in the OS
anyway. That's why the change in the Transparent Huge Pages settings did
not have any impact either.

The others are either CMS specific (from UseConcMarkSweepGC to
CMSParallelRemarkEnabled) or would limit the ability of G1 to
dynamically adapt the young generation (NewRatio to
MaxTenuringThreshold).

Afaik AggressiveOpts does not actually do a lot any more for some time
but I do not think it hurts.

>  I haven't got any tuning options at all for G1, and I'm
> looking for suggestions.  This is my current G1 option list:
> 
> JVM_OPTS=" \
> -XX:+UseG1GC \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \

Add

-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:
+PrintAdaptiveSizePolicy

(the last one prints useful information about some decisions, not really
needed in regular operation and could be removed later)

add

-XX:+ParallelRefProcEnabled

to get reference processing time down.

Use the same settings for heap size as in CMS.

Add

-XX:MaxGCPauseMillis=<your time goal in ms>

to get a time goal G1 will aim for. As mentioned above, it is likely
that G1 will not keep up in many instances with 100ms due to the many
java.lang.Ref instances.

> Based on some recent list activity unrelated to this discussion, I also
> opted to disable transparent huge pages on the Solr servers.  I haven't
> noticed any real difference in the server resource graphs (CPU, load, etc).
> 
> I've started an internal discussion about Java 8 to see how receptive
> everyone will be to an upgrade.

Another potentially interesting thing I forgot about G1 is that in 8u40
G1 can free memory much more freely and dynamically than before. Not
sure you need that, but in your recent sample settings you showed an Xms
value that has been smaller than Xmx.

Thanks,
  Thomas


From ashwin.jayaprakash at gmail.com  Mon Dec 22 17:49:50 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Mon, 22 Dec 2014 09:49:50 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
	<CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
	<28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>
Message-ID: <CAF9YjSD4UOP+uD0VpGOPH3+z1Yi388=QWBTL-+SAYVsgkWudRw@mail.gmail.com>

All, I'm happy to report that disabling THP made a big difference. We do
not see multi-second minor GCs in our cluster anymore.

Thanks for your help.

On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:

> Disabling transparent huge pages should help those GC pauses where you are
> seeing high sys time reported, which should also shorten their pause times.
>
> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>
> charlie
>
> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>
> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
> "never" and thought it was already done. In fact "cat
> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
> 904 and after an hour now, it shows 6845 on one of our machines.
>
> *@Jon* I dug through some of our ElasticSearch/application logs again and
> tried to correlate them with the GC logs. The collection time does seem to
> match the GC log's "real" time. However the collected sizes don't seem to
> match, which is what threw me off.
>
> *Item 1:*
>
> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   31568024 bytes,   31568024 total
> - age   2:    1188576 bytes,   32756600 total
> - age   3:    1830920 bytes,   34587520 total
> - age   4:     282536 bytes,   34870056 total
> - age   5:     316640 bytes,   35186696 total
> - age   6:     249856 bytes,   35436552 total
> : 931773K->49827K(996800K), 1.3622770 secs]
> 22132844K->21256042K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1206815932
> Max   Chunk Size: 1206815932
> Number of Blocks: 1
> Av.  Block  Size: 1206815932
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6189459
> Max   Chunk Size: 6188544
> Number of Blocks: 3
> Av.  Block  Size: 2063153
> Tree      Height: 2
> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application
> threads were stopped: 1.3638970 seconds
>
>
> [2014-12-18T21:34:57,203Z]  [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration
> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
> [20.2gb]->[20.2gb]/[29.2gb]}
>
>
> *Item 2:*
>
> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   32445776 bytes,   32445776 total
> - age   3:    6068000 bytes,   38513776 total
> - age   4:    1031528 bytes,   39545304 total
> - age   5:     255896 bytes,   39801200 total
> : 939702K->53536K(996800K), 2.9352940 secs]
> 21501296K->20625925K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1287922158
> Max   Chunk Size: 1287922158
> Number of Blocks: 1
> Av.  Block  Size: 1287922158
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6205476
> Max   Chunk Size: 6204928
> Number of Blocks: 2
> Av.  Block  Size: 3102738
> Tree      Height: 2
> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application
> threads were stopped: 2.9367640 seconds
>
>
> [2014-12-18T20:53:37,950Z]  [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration
> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>
>
> *Item 3:*
>
> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -966776244
> Max   Chunk Size: -966776244
> Number of Blocks: 1
> Av.  Block  Size: -966776244
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
> - age   1:  113315624 bytes,  113315624 total
> : 996800K->110720K(996800K), 7.3511710 secs]
> 5609422K->5065102K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -1009955715
> Max   Chunk Size: -1009955715
> Number of Blocks: 1
> Av.  Block  Size: -1009955715
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application
> threads were stopped: 7.3525250 seconds
>
>
> [2014-12-17T14:42:17,944Z] [WARN ]
> [elasticsearch[prdaes04data03][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>
>
>
>
> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>>
>> The output:
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Tells me that transparent huge pages are enabled ?always?.
>>
>> I think I would change this to ?never?, even though sysctl -a may be
>> reporting no huge pages are currently in use. The system may trying to
>> coalesce pages occasionally in attempt to make huge pages available, even
>> though you are not currently using any.
>>
>> charlie
>>
>>
>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>> confusion. I was misinterpreting the user time with the real time.
>>
>> *Jon's reply from an offline conversation:*
>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>
>> Something that added to my confusion was the tools we are using in-house.
>> In addition to the GC logs we have 1 tool that uses the
>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>> match the values I see in the GC logs (
>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>> ).
>>
>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>> LastGCInfo (
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>> and
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>> ).
>>
>> Do these methods expose the total time spent by all the parallel GC
>> threads for the ParNew pool or the "real" time? They do not seem to match
>> the GC log times.
>>
>> *@Gustav* We do not have any swapping on the machines. It could be the
>> disk IO experienced by the GC log writer itself, as you've suspected. The
>> machine has 128G of RAM
>>
>> *"top" sample from a similar machine:*
>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
>> 2408:05 java
>>
>>
>> *"free -g":*
>>              total       used       free     shared    buffers     cached
>> Mem:           120        119          0          0          0         95
>> -/+ buffers/cache:         23         96
>> Swap:            0          0          0
>>
>> *@Charlie* Hugepages has already been disabled
>>
>> *sudo sysctl -a | grep hugepage*
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Thanks all!
>>
>>
>>
>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>> wrote:
>>>
>>>  Ashwin,
>>>
>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>
>>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>
>>> While our old gen seems to be very stable with about 40% usage and no
>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>> and is causing some of our requests to time out. The eden space keeps
>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>> being done by our JVM in terms of caching/buffering objects for a few
>>> seconds, writing to disk and then clearing the objects (typical
>>> Lucene/ElasticSearch indexing and querying workload)
>>>
>>>
>>> From you recent mail
>>>
>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>> Comment below  regarding  "eden space keeps filling up".
>>>
>>>
>>>  These long pauses are not good for our server throughput and I was
>>> doing some reading. I got some conflicting reports on how Cassandra is
>>> configured compared to Hadoop. There are also references
>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>> to this old ParNew+CMS bug
>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>> would've been addressed in the JRE version we are using. Cassandra
>>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>
>>>  Since most of our allocations seem to be quite short lived, is there a
>>> way to avoid these "long" young gen pauses?
>>>
>>>  Thanks in advance. Here are some details.
>>>
>>> * Heap settings:*
>>> java -Xmx31000m -Xms31000m
>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>
>>>
>>> * Last few lines of "kill -3 pid" output:*
>>> Heap
>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>>> 0x00007fa1c4950000)
>>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>>> 0x00007fa1d2190000)
>>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>>> 0x00007fa1cb570000)
>>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>
>>>  *Sample gc log:*
>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:    2956312 bytes,    2956312 total
>>> - age   2:     591800 bytes,    3548112 total
>>> - age   3:      66216 bytes,    3614328 total
>>> - age   4:     270752 bytes,    3885080 total
>>> - age   5:     615472 bytes,    4500552 total
>>> - age   6:     358440 bytes,    4858992 total
>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>> 1352217K->463460K(31633280K)After GC:
>>>
>>>
>>> In this GC eden is at 900635k before the GC and is a 8173k after.  That
>>> GC fills up is
>>> the expected behavior.  Is that what you were asking about above?  If
>>> not can you
>>> send me an example of the "fills up" behavior.
>>>
>>> Jon
>>>
>>>   Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -433641480
>>> Max   Chunk Size: -433641480
>>> Number of Blocks: 1
>>> Av.  Block  Size: -433641480
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1227
>>> Max   Chunk Size: 631
>>> Number of Blocks: 3
>>> Av.  Block  Size: 409
>>> Tree      Height: 3
>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>
>>>
>>>  Ashwin Jayaprakash.
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>  _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/89d91d30/attachment.html>

From charlie.hunt at oracle.com  Mon Dec 22 17:52:02 2014
From: charlie.hunt at oracle.com (charlie hunt)
Date: Mon, 22 Dec 2014 11:52:02 -0600
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAF9YjSD4UOP+uD0VpGOPH3+z1Yi388=QWBTL-+SAYVsgkWudRw@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
	<CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
	<28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>
	<CAF9YjSD4UOP+uD0VpGOPH3+z1Yi388=QWBTL-+SAYVsgkWudRw@mail.gmail.com>
Message-ID: <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com>

Thanks for reporting back, and great to hear disabling THP has solved your multi-second minor GCs issue. :-)

charlie

> On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com> wrote:
> 
> All, I'm happy to report that disabling THP made a big difference. We do not see multi-second minor GCs in our cluster anymore.
> 
> Thanks for your help.
> 
> On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
> Disabling transparent huge pages should help those GC pauses where you are seeing high sys time reported, which should also shorten their pause times.
> 
> Thanks for also sharing your observation of khugepaged/pages_collapsed.
> 
> charlie
> 
>> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com <mailto:ashwin.jayaprakash at gmail.com>> wrote:
>> 
>> @Charlie/@Holger My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines.
>> 
>> @Jon I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off.
>> 
>> Item 1:
>> 
>> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:   31568024 bytes,   31568024 total
>> - age   2:    1188576 bytes,   32756600 total
>> - age   3:    1830920 bytes,   34587520 total
>> - age   4:     282536 bytes,   34870056 total
>> - age   5:     316640 bytes,   35186696 total
>> - age   6:     249856 bytes,   35436552 total
>> : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1206815932
>> Max   Chunk Size: 1206815932
>> Number of Blocks: 1
>> Av.  Block  Size: 1206815932
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 6189459
>> Max   Chunk Size: 6188544
>> Number of Blocks: 3
>> Av.  Block  Size: 2063153
>> Tree      Height: 2
>> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
>> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds
>> 
>> 
>> [2014-12-18T21:34:57,203Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]}
>> 
>> 
>> Item 2:
>> 
>> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:   32445776 bytes,   32445776 total
>> - age   3:    6068000 bytes,   38513776 total
>> - age   4:    1031528 bytes,   39545304 total
>> - age   5:     255896 bytes,   39801200 total
>> : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1287922158
>> Max   Chunk Size: 1287922158
>> Number of Blocks: 1
>> Av.  Block  Size: 1287922158
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 6205476
>> Max   Chunk Size: 6204928
>> Number of Blocks: 2
>> Av.  Block  Size: 3102738 <tel:3102738>
>> Tree      Height: 2
>> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
>> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds
>> 
>> 
>> [2014-12-18T20:53:37,950Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>> 
>> 
>> Item 3:
>> 
>> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -966776244
>> Max   Chunk Size: -966776244
>> Number of Blocks: 1
>> Av.  Block  Size: -966776244
>> Tree      Height: 1
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 530
>> Max   Chunk Size: 268
>> Number of Blocks: 2
>> Av.  Block  Size: 265
>> Tree      Height: 2
>> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
>> - age   1:  113315624 bytes,  113315624 total
>> : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -1009955715
>> Max   Chunk Size: -1009955715
>> Number of Blocks: 1
>> Av.  Block  Size: -1009955715
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 530
>> Max   Chunk Size: 268
>> Number of Blocks: 2
>> Av.  Block  Size: 265
>> Tree      Height: 2
>> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
>> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds
>> 
>> 
>> [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>> 
>> 
>> 
>> 
>> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
>> The output:
>>> cat /sys/kernel/mm/transparent_hugepage/enabled
>>> [always] madvise never
>> 
>> Tells me that transparent huge pages are enabled ?always?.
>> 
>> I think I would change this to ?never?, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any.
>> 
>> charlie
>> 
>> 
>>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com <mailto:ashwin.jayaprakash at gmail.com>> wrote:
>>> 
>>> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time.
>>> 
>>> Jon's reply from an offline conversation:
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>> 
>>> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29>).
>>> 
>>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194> and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187>).
>>> 
>>> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times.
>>> 
>>> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM
>>> 
>>> "top" sample from a similar machine:
>>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1        2408:05 java
>>> 
>>> "free -g":
>>>              total       used       free     shared    buffers     cached
>>> Mem:           120        119          0          0          0         95
>>> -/+ buffers/cache:         23         96
>>> Swap:            0          0          0
>>> 
>>> @Charlie Hugepages has already been disabled
>>> 
>>> sudo sysctl -a | grep hugepage
>>> vm.nr_hugepages = 0
>>> vm.nr_hugepages_mempolicy = 0
>>> vm.hugepages_treat_as_movable = 0
>>> vm.nr_overcommit_hugepages = 0
>>> 
>>> cat /sys/kernel/mm/transparent_hugepage/enabled
>>> [always] madvise never
>>> 
>>> 
>>> Thanks all!
>>> 
>>> 
>>> 
>>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>>> Ashwin,
>>> 
>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). 
>>>> 
>>>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload)
>>> 
>>> From you recent mail
>>> 
>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>> 
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>> 
>>> Comment below  regarding  "eden space keeps filling up".
>>> 
>>>> 
>>>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/> to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought would've been addressed in the JRE version we are using. Cassandra recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>> 
>>>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses?
>>>> 
>>>> Thanks in advance. Here are some details.
>>>> 
>>>> Heap settings:
>>>> java -Xmx31000m -Xms31000m 
>>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m 
>>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 
>>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure 
>>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution 
>>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps 
>>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails 
>>>> 
>>>> 
>>>> Last few lines of "kill -3 pid" output:
>>>> Heap
>>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>>  concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>> 
>>>> Sample gc log:
>>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>>> - age   1:    2956312 bytes,    2956312 total
>>>> - age   2:     591800 bytes,    3548112 total
>>>> - age   3:      66216 bytes,    3614328 total
>>>> - age   4:     270752 bytes,    3885080 total
>>>> - age   5:     615472 bytes,    4500552 total
>>>> - age   6:     358440 bytes,    4858992 total
>>>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC:
>>> 
>>> In this GC eden is at 900635k before the GC and is a 8173k after.  That GC fills up is
>>> the expected behavior.  Is that what you were asking about above?  If not can you
>>> send me an example of the "fills up" behavior.
>>> 
>>> Jon
>>> 
>>>> Statistics for BinaryTreeDictionary:
>>>> ------------------------------------
>>>> Total Free Space: -433641480
>>>> Max   Chunk Size: -433641480
>>>> Number of Blocks: 1
>>>> Av.  Block  Size: -433641480
>>>> Tree      Height: 1
>>>> After GC:
>>>> Statistics for BinaryTreeDictionary:
>>>> ------------------------------------
>>>> Total Free Space: 1227
>>>> Max   Chunk Size: 631
>>>> Number of Blocks: 3
>>>> Av.  Block  Size: 409
>>>> Tree      Height: 3
>>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] 
>>>> 
>>>> 
>>>> Ashwin Jayaprakash.
>>>> 
>>>> 
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>>> 
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/f132b819/attachment-0001.html>

From gustav.r.akesson at gmail.com  Mon Dec 22 20:45:20 2014
From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=)
Date: Mon, 22 Dec 2014 21:45:20 +0100
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
	<CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
	<28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>
	<CAF9YjSD4UOP+uD0VpGOPH3+z1Yi388=QWBTL-+SAYVsgkWudRw@mail.gmail.com>
	<9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com>
Message-ID: <CAKEw5+6Z2EBLsb7buiDDUsN=gWLh7zy+_mSAyCpGx6BDE2VT7A@mail.gmail.com>

Hi,

Indeed thank you for reporting back on this. TIL about THP, so to say...

Best Regards,
Gustav?kesson
Den 22 dec 2014 18:52 skrev "charlie hunt" <charlie.hunt at oracle.com>:

> Thanks for reporting back, and great to hear disabling THP has solved your
> multi-second minor GCs issue. :-)
>
> charlie
>
> On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>
> All, I'm happy to report that disabling THP made a big difference. We do
> not see multi-second minor GCs in our cluster anymore.
>
> Thanks for your help.
>
> On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>
>> Disabling transparent huge pages should help those GC pauses where you
>> are seeing high sys time reported, which should also shorten their pause
>> times.
>>
>> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>>
>> charlie
>>
>> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
>> "never" and thought it was already done. In fact "cat
>> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
>> 904 and after an hour now, it shows 6845 on one of our machines.
>>
>> *@Jon* I dug through some of our ElasticSearch/application logs again
>> and tried to correlate them with the GC logs. The collection time does seem
>> to match the GC log's "real" time. However the collected sizes don't seem
>> to match, which is what threw me off.
>>
>> *Item 1:*
>>
>> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:   31568024 bytes,   31568024 total
>> - age   2:    1188576 bytes,   32756600 total
>> - age   3:    1830920 bytes,   34587520 total
>> - age   4:     282536 bytes,   34870056 total
>> - age   5:     316640 bytes,   35186696 total
>> - age   6:     249856 bytes,   35436552 total
>> : 931773K->49827K(996800K), 1.3622770 secs]
>> 22132844K->21256042K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1206815932
>> Max   Chunk Size: 1206815932
>> Number of Blocks: 1
>> Av.  Block  Size: 1206815932
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 6189459
>> Max   Chunk Size: 6188544
>> Number of Blocks: 3
>> Av.  Block  Size: 2063153
>> Tree      Height: 2
>> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
>> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which
>> application threads were stopped: 1.3638970 seconds
>>
>>
>> [2014-12-18T21:34:57,203Z]  [WARN ]
>> [elasticsearch[server00001][scheduler][T#1]]
>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration
>> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
>> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
>> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
>> [20.2gb]->[20.2gb]/[29.2gb]}
>>
>>
>> *Item 2:*
>>
>> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:   32445776 bytes,   32445776 total
>> - age   3:    6068000 bytes,   38513776 total
>> - age   4:    1031528 bytes,   39545304 total
>> - age   5:     255896 bytes,   39801200 total
>> : 939702K->53536K(996800K), 2.9352940 secs]
>> 21501296K->20625925K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1287922158
>> Max   Chunk Size: 1287922158
>> Number of Blocks: 1
>> Av.  Block  Size: 1287922158
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 6205476
>> Max   Chunk Size: 6204928
>> Number of Blocks: 2
>> Av.  Block  Size: 3102738
>> Tree      Height: 2
>> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
>> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which
>> application threads were stopped: 2.9367640 seconds
>>
>>
>> [2014-12-18T20:53:37,950Z]  [WARN ]
>> [elasticsearch[server00001][scheduler][T#1]]
>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration
>> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
>> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
>> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
>> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>>
>>
>> *Item 3:*
>>
>> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -966776244
>> Max   Chunk Size: -966776244
>> Number of Blocks: 1
>> Av.  Block  Size: -966776244
>> Tree      Height: 1
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 530
>> Max   Chunk Size: 268
>> Number of Blocks: 2
>> Av.  Block  Size: 265
>> Tree      Height: 2
>> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
>> - age   1:  113315624 bytes,  113315624 total
>> : 996800K->110720K(996800K), 7.3511710 secs]
>> 5609422K->5065102K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -1009955715
>> Max   Chunk Size: -1009955715
>> Number of Blocks: 1
>> Av.  Block  Size: -1009955715
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 530
>> Max   Chunk Size: 268
>> Number of Blocks: 2
>> Av.  Block  Size: 265
>> Tree      Height: 2
>> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
>> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application
>> threads were stopped: 7.3525250 seconds
>>
>>
>> [2014-12-17T14:42:17,944Z] [WARN ]
>> [elasticsearch[prdaes04data03][scheduler][T#1]]
>> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
>> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
>> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
>> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
>> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>>
>>
>>
>>
>> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
>> wrote:
>>>
>>> The output:
>>>
>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>> [always] madvise never
>>>
>>>
>>> Tells me that transparent huge pages are enabled ?always?.
>>>
>>> I think I would change this to ?never?, even though sysctl -a may be
>>> reporting no huge pages are currently in use. The system may trying to
>>> coalesce pages occasionally in attempt to make huge pages available, even
>>> though you are not currently using any.
>>>
>>> charlie
>>>
>>>
>>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>>> ashwin.jayaprakash at gmail.com> wrote:
>>>
>>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>>> confusion. I was misinterpreting the user time with the real time.
>>>
>>> *Jon's reply from an offline conversation:*
>>>
>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>> threads.
>>>> The real time is the GC pause time that your application experiences.
>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>
>>>
>>> Something that added to my confusion was the tools we are using
>>> in-house. In addition to the GC logs we have 1 tool that uses the
>>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>>> match the values I see in the GC logs (
>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>>> ).
>>>
>>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>>> LastGCInfo (
>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>>> and
>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>>> ).
>>>
>>> Do these methods expose the total time spent by all the parallel GC
>>> threads for the ParNew pool or the "real" time? They do not seem to match
>>> the GC log times.
>>>
>>> *@Gustav* We do not have any swapping on the machines. It could be the
>>> disk IO experienced by the GC log writer itself, as you've suspected. The
>>> machine has 128G of RAM
>>>
>>> *"top" sample from a similar machine:*
>>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
>>> 2408:05 java
>>>
>>>
>>> *"free -g":*
>>>              total       used       free     shared    buffers     cached
>>> Mem:           120        119          0          0          0         95
>>> -/+ buffers/cache:         23         96
>>> Swap:            0          0          0
>>>
>>> *@Charlie* Hugepages has already been disabled
>>>
>>> *sudo sysctl -a | grep hugepage*
>>> vm.nr_hugepages = 0
>>> vm.nr_hugepages_mempolicy = 0
>>> vm.hugepages_treat_as_movable = 0
>>> vm.nr_overcommit_hugepages = 0
>>>
>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>> [always] madvise never
>>>
>>>
>>> Thanks all!
>>>
>>>
>>>
>>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com
>>> > wrote:
>>>>
>>>>  Ashwin,
>>>>
>>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>>
>>>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>>
>>>> While our old gen seems to be very stable with about 40% usage and no
>>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>>> and is causing some of our requests to time out. The eden space keeps
>>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>>> being done by our JVM in terms of caching/buffering objects for a few
>>>> seconds, writing to disk and then clearing the objects (typical
>>>> Lucene/ElasticSearch indexing and querying workload)
>>>>
>>>>
>>>> From you recent mail
>>>>
>>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>>
>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>> threads.
>>>> The real time is the GC pause time that your application experiences.
>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>
>>>> Comment below  regarding  "eden space keeps filling up".
>>>>
>>>>
>>>>  These long pauses are not good for our server throughput and I was
>>>> doing some reading. I got some conflicting reports on how Cassandra is
>>>> configured compared to Hadoop. There are also references
>>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>>> to this old ParNew+CMS bug
>>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>>> would've been addressed in the JRE version we are using. Cassandra
>>>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>>>> a larger NewSize with just 1 for max tenuring, whereas Hadoop
>>>> recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a small
>>>> NewSize.
>>>>
>>>>  Since most of our allocations seem to be quite short lived, is there
>>>> a way to avoid these "long" young gen pauses?
>>>>
>>>>  Thanks in advance. Here are some details.
>>>>
>>>> * Heap settings:*
>>>> java -Xmx31000m -Xms31000m
>>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>>
>>>>
>>>> * Last few lines of "kill -3 pid" output:*
>>>> Heap
>>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>>   eden space 886080K,  94% used [0x00007fa18e800000,
>>>> 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>>   from space 110720K,  25% used [0x00007fa1cb570000,
>>>> 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>>   to   space 110720K,   0% used [0x00007fa1c4950000,
>>>> 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>>
>>>>  *Sample gc log:*
>>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>>> - age   1:    2956312 bytes,    2956312 total
>>>> - age   2:     591800 bytes,    3548112 total
>>>> - age   3:      66216 bytes,    3614328 total
>>>> - age   4:     270752 bytes,    3885080 total
>>>> - age   5:     615472 bytes,    4500552 total
>>>> - age   6:     358440 bytes,    4858992 total
>>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>>> 1352217K->463460K(31633280K)After GC:
>>>>
>>>>
>>>> In this GC eden is at 900635k before the GC and is a 8173k after.  That
>>>> GC fills up is
>>>> the expected behavior.  Is that what you were asking about above?  If
>>>> not can you
>>>> send me an example of the "fills up" behavior.
>>>>
>>>> Jon
>>>>
>>>>   Statistics for BinaryTreeDictionary:
>>>> ------------------------------------
>>>> Total Free Space: -433641480
>>>> Max   Chunk Size: -433641480
>>>> Number of Blocks: 1
>>>> Av.  Block  Size: -433641480
>>>> Tree      Height: 1
>>>> After GC:
>>>> Statistics for BinaryTreeDictionary:
>>>> ------------------------------------
>>>> Total Free Space: 1227
>>>> Max   Chunk Size: 631
>>>> Number of Blocks: 3
>>>> Av.  Block  Size: 409
>>>> Tree      Height: 3
>>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>>
>>>>
>>>>  Ashwin Jayaprakash.
>>>>
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>>  _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/822afb6e/attachment.html>

From ashwin.jayaprakash at gmail.com  Mon Dec 22 21:32:06 2014
From: ashwin.jayaprakash at gmail.com (Ashwin Jayaprakash)
Date: Mon, 22 Dec 2014 13:32:06 -0800
Subject: Multi-second ParNew collections but stable CMS
In-Reply-To: <CAKEw5+6Z2EBLsb7buiDDUsN=gWLh7zy+_mSAyCpGx6BDE2VT7A@mail.gmail.com>
References: <CAF9YjSCX=+-3oatHhk05mfa+Ub4LbNKtZNzUz4qXhdnVBEvvZQ@mail.gmail.com>
	<5492272A.2040304@oracle.com>
	<CAF9YjSBATjops_p0LLV4Uuo3M=wrq9e=SE_N=4AMmFrrE+Pb+w@mail.gmail.com>
	<613BB518-BB80-42A5-8E24-FC5683E33577@oracle.com>
	<CAF9YjSDo7yEA2Jx_fuaNR3Y0p=_JRJckUt48s0mG7Gw24tKFkw@mail.gmail.com>
	<28DAC751-9A6F-4C18-B7ED-4169C8CDC4F7@oracle.com>
	<CAF9YjSD4UOP+uD0VpGOPH3+z1Yi388=QWBTL-+SAYVsgkWudRw@mail.gmail.com>
	<9A1D67CA-4B26-401B-91D3-EEC64D3E86CA@oracle.com>
	<CAKEw5+6Z2EBLsb7buiDDUsN=gWLh7zy+_mSAyCpGx6BDE2VT7A@mail.gmail.com>
Message-ID: <CAF9YjSD7igyjd3bru6pYB-D4sv9kTVr9K5uwviES+mv2nJmV-A@mail.gmail.com>

Just to summarize, we did disable THP and noticed minor GC times going down
considerably. What still puzzles me is that the OS still reports huge pages
in use but only a little bit - some food for thought:


[user1 at server0001 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]


[user1 at server0001 ~]# cat
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
0


[user1 at server0001 ~]# grep AnonHugePages /proc/meminfo
AnonHugePages:    331776 kB


[user1 at server0001 ~]# egrep 'trans|thp' /proc/vmstat
nr_anon_transparent_hugepages 162
thp_fault_alloc 209
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 11


(Huge page use per process - https://access.redhat.com/solutions/46111)

[user1 at server0001 ~]# grep -e AnonHugePages  /proc/*/smaps | awk  '{
if($2>4) print $0} ' |  awk -F "/"  '{print $0; system("ps -fp " $3)} '
/proc/1330/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:    116736 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:     43008 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1346/smaps:AnonHugePages:     12288 kB


Thanks,
Ashwin Jayaprakash.


On Mon, Dec 22, 2014 at 12:45 PM, Gustav ?kesson <gustav.r.akesson at gmail.com
> wrote:

> Hi,
>
> Indeed thank you for reporting back on this. TIL about THP, so to say...
>
> Best Regards,
> Gustav?kesson
> Den 22 dec 2014 18:52 skrev "charlie hunt" <charlie.hunt at oracle.com>:
>
> Thanks for reporting back, and great to hear disabling THP has solved your
>> multi-second minor GCs issue. :-)
>>
>> charlie
>>
>> On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> All, I'm happy to report that disabling THP made a big difference. We do
>> not see multi-second minor GCs in our cluster anymore.
>>
>> Thanks for your help.
>>
>> On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
>> wrote:
>>
>>> Disabling transparent huge pages should help those GC pauses where you
>>> are seeing high sys time reported, which should also shorten their pause
>>> times.
>>>
>>> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>>>
>>> charlie
>>>
>>> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
>>> ashwin.jayaprakash at gmail.com> wrote:
>>>
>>> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
>>> "never" and thought it was already done. In fact "cat
>>> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
>>> 904 and after an hour now, it shows 6845 on one of our machines.
>>>
>>> *@Jon* I dug through some of our ElasticSearch/application logs again
>>> and tried to correlate them with the GC logs. The collection time does seem
>>> to match the GC log's "real" time. However the collected sizes don't seem
>>> to match, which is what threw me off.
>>>
>>> *Item 1:*
>>>
>>> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:   31568024 bytes,   31568024 total
>>> - age   2:    1188576 bytes,   32756600 total
>>> - age   3:    1830920 bytes,   34587520 total
>>> - age   4:     282536 bytes,   34870056 total
>>> - age   5:     316640 bytes,   35186696 total
>>> - age   6:     249856 bytes,   35436552 total
>>> : 931773K->49827K(996800K), 1.3622770 secs]
>>> 22132844K->21256042K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1206815932
>>> Max   Chunk Size: 1206815932
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1206815932
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 6189459
>>> Max   Chunk Size: 6188544
>>> Number of Blocks: 3
>>> Av.  Block  Size: 2063153
>>> Tree      Height: 2
>>> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
>>> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which
>>> application threads were stopped: 1.3638970 seconds
>>>
>>>
>>> [2014-12-18T21:34:57,203Z]  [WARN ]
>>> [elasticsearch[server00001][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration
>>> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
>>> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
>>> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
>>> [20.2gb]->[20.2gb]/[29.2gb]}
>>>
>>>
>>> *Item 2:*
>>>
>>> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:   32445776 bytes,   32445776 total
>>> - age   3:    6068000 bytes,   38513776 total
>>> - age   4:    1031528 bytes,   39545304 total
>>> - age   5:     255896 bytes,   39801200 total
>>> : 939702K->53536K(996800K), 2.9352940 secs]
>>> 21501296K->20625925K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1287922158
>>> Max   Chunk Size: 1287922158
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1287922158
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 6205476
>>> Max   Chunk Size: 6204928
>>> Number of Blocks: 2
>>> Av.  Block  Size: 3102738
>>> Tree      Height: 2
>>> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
>>> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which
>>> application threads were stopped: 2.9367640 seconds
>>>
>>>
>>> [2014-12-18T20:53:37,950Z]  [WARN ]
>>> [elasticsearch[server00001][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration
>>> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
>>> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
>>> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
>>> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>>>
>>>
>>> *Item 3:*
>>>
>>> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -966776244
>>> Max   Chunk Size: -966776244
>>> Number of Blocks: 1
>>> Av.  Block  Size: -966776244
>>> Tree      Height: 1
>>> Before GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 530
>>> Max   Chunk Size: 268
>>> Number of Blocks: 2
>>> Av.  Block  Size: 265
>>> Tree      Height: 2
>>> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
>>> - age   1:  113315624 bytes,  113315624 total
>>> : 996800K->110720K(996800K), 7.3511710 secs]
>>> 5609422K->5065102K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -1009955715
>>> Max   Chunk Size: -1009955715
>>> Number of Blocks: 1
>>> Av.  Block  Size: -1009955715
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 530
>>> Max   Chunk Size: 268
>>> Number of Blocks: 2
>>> Av.  Block  Size: 265
>>> Tree      Height: 2
>>> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
>>> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which
>>> application threads were stopped: 7.3525250 seconds
>>>
>>>
>>> [2014-12-17T14:42:17,944Z] [WARN ]
>>> [elasticsearch[prdaes04data03][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
>>> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
>>> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
>>> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
>>> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>>>
>>>
>>>
>>>
>>> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
>>> wrote:
>>>>
>>>> The output:
>>>>
>>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>>> [always] madvise never
>>>>
>>>>
>>>> Tells me that transparent huge pages are enabled ?always?.
>>>>
>>>> I think I would change this to ?never?, even though sysctl -a may be
>>>> reporting no huge pages are currently in use. The system may trying to
>>>> coalesce pages occasionally in attempt to make huge pages available, even
>>>> though you are not currently using any.
>>>>
>>>> charlie
>>>>
>>>>
>>>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>>>> ashwin.jayaprakash at gmail.com> wrote:
>>>>
>>>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>>>> confusion. I was misinterpreting the user time with the real time.
>>>>
>>>> *Jon's reply from an offline conversation:*
>>>>
>>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>>> threads.
>>>>> The real time is the GC pause time that your application experiences.
>>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>>
>>>>
>>>> Something that added to my confusion was the tools we are using
>>>> in-house. In addition to the GC logs we have 1 tool that uses the
>>>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>>>> match the values I see in the GC logs (
>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>>>> ).
>>>>
>>>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>>>> LastGCInfo (
>>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>>>> and
>>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>>>> ).
>>>>
>>>> Do these methods expose the total time spent by all the parallel GC
>>>> threads for the ParNew pool or the "real" time? They do not seem to match
>>>> the GC log times.
>>>>
>>>> *@Gustav* We do not have any swapping on the machines. It could be the
>>>> disk IO experienced by the GC log writer itself, as you've suspected. The
>>>> machine has 128G of RAM
>>>>
>>>> *"top" sample from a similar machine:*
>>>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
>>>> 2408:05 java
>>>>
>>>>
>>>> *"free -g":*
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:           120        119          0          0          0
>>>> 95
>>>> -/+ buffers/cache:         23         96
>>>> Swap:            0          0          0
>>>>
>>>> *@Charlie* Hugepages has already been disabled
>>>>
>>>> *sudo sysctl -a | grep hugepage*
>>>> vm.nr_hugepages = 0
>>>> vm.nr_hugepages_mempolicy = 0
>>>> vm.hugepages_treat_as_movable = 0
>>>> vm.nr_overcommit_hugepages = 0
>>>>
>>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>>> [always] madvise never
>>>>
>>>>
>>>> Thanks all!
>>>>
>>>>
>>>>
>>>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <
>>>> jon.masamitsu at oracle.com> wrote:
>>>>>
>>>>>  Ashwin,
>>>>>
>>>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>>>
>>>>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>>>
>>>>> While our old gen seems to be very stable with about 40% usage and no
>>>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>>>> and is causing some of our requests to time out. The eden space keeps
>>>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>>>> being done by our JVM in terms of caching/buffering objects for a few
>>>>> seconds, writing to disk and then clearing the objects (typical
>>>>> Lucene/ElasticSearch indexing and querying workload)
>>>>>
>>>>>
>>>>> From you recent mail
>>>>>
>>>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>>>
>>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>>> threads.
>>>>> The real time is the GC pause time that your application experiences.
>>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>>
>>>>> Comment below  regarding  "eden space keeps filling up".
>>>>>
>>>>>
>>>>>  These long pauses are not good for our server throughput and I was
>>>>> doing some reading. I got some conflicting reports on how Cassandra is
>>>>> configured compared to Hadoop. There are also references
>>>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>>>> to this old ParNew+CMS bug
>>>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>>>> would've been addressed in the JRE version we are using. Cassandra
>>>>> recommends
>>>>> <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger
>>>>> NewSize with just 1 for max tenuring, whereas Hadoop recommends
>>>>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>>>
>>>>>  Since most of our allocations seem to be quite short lived, is there
>>>>> a way to avoid these "long" young gen pauses?
>>>>>
>>>>>  Thanks in advance. Here are some details.
>>>>>
>>>>> * Heap settings:*
>>>>> java -Xmx31000m -Xms31000m
>>>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>>>
>>>>>
>>>>> * Last few lines of "kill -3 pid" output:*
>>>>> Heap
>>>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>>>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>>>   eden space 886080K,  94% used [0x00007fa18e800000,
>>>>> 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>>>   from space 110720K,  25% used [0x00007fa1cb570000,
>>>>> 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>>>   to   space 110720K,   0% used [0x00007fa1c4950000,
>>>>> 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>>>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>>>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>>>
>>>>>  *Sample gc log:*
>>>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>>>> - age   1:    2956312 bytes,    2956312 total
>>>>> - age   2:     591800 bytes,    3548112 total
>>>>> - age   3:      66216 bytes,    3614328 total
>>>>> - age   4:     270752 bytes,    3885080 total
>>>>> - age   5:     615472 bytes,    4500552 total
>>>>> - age   6:     358440 bytes,    4858992 total
>>>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>>>> 1352217K->463460K(31633280K)After GC:
>>>>>
>>>>>
>>>>> In this GC eden is at 900635k before the GC and is a 8173k after.
>>>>> That GC fills up is
>>>>> the expected behavior.  Is that what you were asking about above?  If
>>>>> not can you
>>>>> send me an example of the "fills up" behavior.
>>>>>
>>>>> Jon
>>>>>
>>>>>   Statistics for BinaryTreeDictionary:
>>>>> ------------------------------------
>>>>> Total Free Space: -433641480
>>>>> Max   Chunk Size: -433641480
>>>>> Number of Blocks: 1
>>>>> Av.  Block  Size: -433641480
>>>>> Tree      Height: 1
>>>>> After GC:
>>>>> Statistics for BinaryTreeDictionary:
>>>>> ------------------------------------
>>>>> Total Free Space: 1227
>>>>> Max   Chunk Size: 631
>>>>> Number of Blocks: 3
>>>>> Av.  Block  Size: 409
>>>>> Tree      Height: 3
>>>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>>>
>>>>>
>>>>>  Ashwin Jayaprakash.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/e4c83ca2/attachment-0001.html>

From java at elyograg.org  Tue Dec 23 17:46:27 2014
From: java at elyograg.org (Shawn Heisey)
Date: Tue, 23 Dec 2014 10:46:27 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <1419170477.6868.1.camel@oracle.com>
References: <54906C34.8080408@elyograg.org>	<1418831456.3255.22.camel@oracle.com>
	<5491D659.1090703@elyograg.org>	<1418849513.3293.3.camel@oracle.com>
	<5494D0D7.2010606@elyograg.org>
	<1419170477.6868.1.camel@oracle.com>
Message-ID: <5499AA73.9090003@elyograg.org>

On 12/21/2014 7:01 AM, Thomas Schatzl wrote:
>
> Add
>
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:
> +PrintAdaptiveSizePolicy

I have GC logging options in a separate environment variable.

GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps 
-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC"

Here's the new G1 options list based on your feedback:

JVM_OPTS=" \
-XX:+UseG1GC \
-XX:NewRatio=3 \
-XX:+ParallelRefProcEnabled
-XX:maxGCPauseMillis=200
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Thanks,
Shawn


From thomas.schatzl at oracle.com  Tue Dec 23 17:55:47 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Dec 2014 18:55:47 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <5499AA73.9090003@elyograg.org>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org>
	<1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org>
	<1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org>
Message-ID: <1419357347.3128.1.camel@oracle.com>

Hi,

On Tue, 2014-12-23 at 10:46 -0700, Shawn Heisey wrote:
> On 12/21/2014 7:01 AM, Thomas Schatzl wrote:
> >
> > Add
> >
> > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:
> > +PrintAdaptiveSizePolicy
> 
> I have GC logging options in a separate environment variable.
> 
> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC"

ParallelRefProcEnabled is missing.

use

GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps 
-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC
-XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest"

The last two are additional verboseness options.

> 
> Here's the new G1 options list based on your feedback:
> 
> JVM_OPTS=" \
> -XX:+UseG1GC \
> -XX:NewRatio=3 \

Remove NewRatio. This will severely limit adaptiveness.

> -XX:+ParallelRefProcEnabled
> -XX:maxGCPauseMillis=200

Use "MaxGCPauseMillis" with capital M.

> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "

I.e.

JVM_OPTS=" \
  -XX:+UseG1GC \
  -XX:+ParallelRefProcEnabled \
  -XX:MaxGCPauseMillis=200 \
  -XX:+UseLargePages \
  -XX:+AggressiveOpts

Actually it might be as good to simply use:

JVM_OPTS=" \
  -XX:+UseG1GC \
  -XX:+ParallelRefProcEnabled"

Because 200 is the default value for MaxGCPauseMillis, and the others
either are not used anyway (no large pages in your system) or won't have
any noticeable impact (AggressiveOpts has last been tuned to current
systems ages ago; the only useful part of that is "-server" to enable
the server compiler, but on 64 bit VMs the server compiler is default
too). Always good to start from a clean slate.

Depending on the results from the log we can improve the settings. 

Thanks,
  Thomas


From java at elyograg.org  Tue Dec 23 21:04:57 2014
From: java at elyograg.org (Shawn Heisey)
Date: Tue, 23 Dec 2014 14:04:57 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <1419357347.3128.1.camel@oracle.com>
References: <54906C34.8080408@elyograg.org>	<1418831456.3255.22.camel@oracle.com>
	<5491D659.1090703@elyograg.org>	<1418849513.3293.3.camel@oracle.com>
	<5494D0D7.2010606@elyograg.org>	<1419170477.6868.1.camel@oracle.com>
	<5499AA73.9090003@elyograg.org>
	<1419357347.3128.1.camel@oracle.com>
Message-ID: <5499D8F9.9020607@elyograg.org>

On 12/23/2014 10:55 AM, Thomas Schatzl wrote:
> Remove NewRatio. This will severely limit adaptiveness.
> 
>> -XX:+ParallelRefProcEnabled
>> -XX:maxGCPauseMillis=200
> 
> Use "MaxGCPauseMillis" with capital M.
> 
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> "
> 
> I.e.
> 
> JVM_OPTS=" \
>   -XX:+UseG1GC \
>   -XX:+ParallelRefProcEnabled \
>   -XX:MaxGCPauseMillis=200 \
>   -XX:+UseLargePages \
>   -XX:+AggressiveOpts

That's what I ultimately ended up with.  I don't have a lot of runtime
yet, but this is looking REALLY good.  It looks like parallel reference
processing and waiting for a later Java 7 release were the secrets to
using G1 effectively.

https://www.dropbox.com/s/bhq97ishhb8d94w/g1gc-with-parallel-ref.png?dl=0

Here's the GC log for that graph:

https://www.dropbox.com/s/9g687luo60bd4r0/g1gc-with-parallel-ref.log?dl=0

I got a little runtime in before removing NewRatio.  It's not quite as
good as the graph/log above, so removing it was a good thing:

https://www.dropbox.com/s/ups6r2hohrnfcud/g1gc-with-parallel-ref-and-newratio-3.png?dl=0

https://www.dropbox.com/s/ccwu7axgdebywjt/g1gc-with-parallel-ref-and-newratio-3.log?dl=0

After I've got a few hours (and ultimately a few days) of runtime on the
new settings, I will grab the log and graph it again.

Many thanks for all your help!

Shawn


From thomas.schatzl at oracle.com  Tue Dec 30 10:12:19 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 30 Dec 2014 11:12:19 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <5499D8F9.9020607@elyograg.org>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com> <5491D659.1090703@elyograg.org>
	<1418849513.3293.3.camel@oracle.com> <5494D0D7.2010606@elyograg.org>
	<1419170477.6868.1.camel@oracle.com> <5499AA73.9090003@elyograg.org>
	<1419357347.3128.1.camel@oracle.com> <5499D8F9.9020607@elyograg.org>
Message-ID: <1419934339.3250.1.camel@oracle.com>

Hi Shawn,

On Tue, 2014-12-23 at 14:04 -0700, Shawn Heisey wrote:
> On 12/23/2014 10:55 AM, Thomas Schatzl wrote:
> > Remove NewRatio. This will severely limit adaptiveness.
> > 
> >> -XX:+ParallelRefProcEnabled
> >> -XX:maxGCPauseMillis=200
> > 
> > Use "MaxGCPauseMillis" with capital M.
> > 
> >> -XX:+UseLargePages \
> >> -XX:+AggressiveOpts \
> >> "
> > 
> > I.e.
> > 
> > JVM_OPTS=" \
> >   -XX:+UseG1GC \
> >   -XX:+ParallelRefProcEnabled \
> >   -XX:MaxGCPauseMillis=200 \
> >   -XX:+UseLargePages \
> >   -XX:+AggressiveOpts
> 
> That's what I ultimately ended up with.  I don't have a lot of runtime
> yet, but this is looking REALLY good. 

Great to hear about field experience with G1 - a late Christmas present
for us particularly because they are good.

> It looks like parallel reference
> processing and waiting for a later Java 7 release were the secrets to
> using G1 effectively.

You really might want to try 8u40: the small logs you provided indicate
that there is at least some amount of large object allocation going on
("occupancy higher than threshold [...] cause: G1 Humongous
Allocation"). 8u40 added some special handling for those which allows
fast and cheap reclamation of these in some cases, which improves
throughput by decreasing GC frequency.

Nothing worrying I think given these logs, but these
allocations/messages seem frequent enough that it could be considered
useful.

> After I've got a few hours (and ultimately a few days) of runtime on the
> new settings, I will grab the log and graph it again.

Would be really nice to have them for analysis.

Maybe they could be used to tweak the threshold that starts concurrent
cycles to reduce the number of GCs. 

> 
> Many thanks for all your help!

Did not do anything yet other than removing all CMS flags :)

Thanks,
  Thomas


From java at elyograg.org  Tue Dec 30 18:20:44 2014
From: java at elyograg.org (Shawn Heisey)
Date: Tue, 30 Dec 2014 11:20:44 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <1419934339.3250.1.camel@oracle.com>
References: <54906C34.8080408@elyograg.org>	<1418831456.3255.22.camel@oracle.com>
	<5491D659.1090703@elyograg.org>	<1418849513.3293.3.camel@oracle.com>
	<5494D0D7.2010606@elyograg.org>	<1419170477.6868.1.camel@oracle.com>
	<5499AA73.9090003@elyograg.org>	<1419357347.3128.1.camel@oracle.com>
	<5499D8F9.9020607@elyograg.org>
	<1419934339.3250.1.camel@oracle.com>
Message-ID: <54A2ECFC.1010401@elyograg.org>

On 12/30/2014 3:12 AM, Thomas Schatzl wrote:
> Great to hear about field experience with G1 - a late Christmas present
> for us particularly because they are good.
> 
>> It looks like parallel reference
>> processing and waiting for a later Java 7 release were the secrets to
>> using G1 effectively.
> 
> You really might want to try 8u40: the small logs you provided indicate
> that there is at least some amount of large object allocation going on
> ("occupancy higher than threshold [...] cause: G1 Humongous
> Allocation"). 8u40 added some special handling for those which allows
> fast and cheap reclamation of these in some cases, which improves
> throughput by decreasing GC frequency.

I do have plans to roll out some Java 8 servers for a new project, and
ultimately I expect we will upgrade to Java 8 for the existing servers,
but it won't be an immediate thing.

> Nothing worrying I think given these logs, but these
> allocations/messages seem frequent enough that it could be considered
> useful.
> 
>> After I've got a few hours (and ultimately a few days) of runtime on the
>> new settings, I will grab the log and graph it again.
> 
> Would be really nice to have them for analysis.
> 
> Maybe they could be used to tweak the threshold that starts concurrent
> cycles to reduce the number of GCs. 

I've now got almost a full week of runtime on these new settings.
Here's the log:

https://www.dropbox.com/s/yla4le5l5jrhiir/gc-idxa1-g1-7u72-with-refproc-one-week.zip?dl=0

Trying to graph this log with gcviewer-1.34.jar, it found five entries
in the log that it didn't know how to deal with.  The times on these
lines do look fairly significant, and I assume that they are probably
not in the resulting graph.

INFO [DataReaderFacade]: GCViewer version 1.34 (2014-11-30T14:40:14+0100)
INFO [DataReaderFactory]: File format: Sun 1.6.x G1 collector
INFO [DataReaderSun1_6_0G1]: Reading Sun 1.6.x / 1.7.x G1 format...
WARNING [DataReaderSun1_6_0G1]:
com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
'M->' Line 47280:  5987M->1658M(6144M), 2.3640370 secs]
WARNING [DataReaderSun1_6_0G1]:
com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
'M->' Line 155388:  2928M->1721M(6104M), 2.7344030 secs]
WARNING [DataReaderSun1_6_0G1]:
com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
'M->' Line 190388:  2615M->1550M(6018M), 2.3079810 secs]
WARNING [DataReaderSun1_6_0G1]:
com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
'M->' Line 392918:  6003M->1602M(6138M), 2.1626330 secs]
WARNING [DataReaderSun1_6_0G1]:
com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
'M->' Line 397195:  6031M->1707M(6138M), 3.0746840 secs]
INFO [DataReaderSun1_6_0G1]: Done reading.

What it DID graph looks fairly good, but there are a handful of long
collections in there.  Only one of those longer collections looked like
a long enough pause that it would trigger a failed load balancer health
check (every five seconds, with a 4990 millisecond timeout), but most of
them are long enough that a user would definitely notice the delay on a
single search.  That probably wouldn't be enough of a problem for them
to lodge a complaint or decide that the site sucks, because the search
would be fast on the next query.  The overall graph shows that a typical
collection happens *very* quickly, so perhaps those few outliers are not
enough of a problem to cause me much concern.  If Java 8 can smooth down
those rough edges, I think we have a clear winner.  Even with Java 7, I
am very excited about these results.

Are there any alternate tools for producing nice graphs from GC logs,
tools that can understand everything in a log from a modern JVM?

Thanks,
Shawn


From yu.zhang at oracle.com  Tue Dec 30 22:06:45 2014
From: yu.zhang at oracle.com (Yu Zhang)
Date: Tue, 30 Dec 2014 14:06:45 -0800
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <54A2ECFC.1010401@elyograg.org>
References: <54906C34.8080408@elyograg.org>	<1418831456.3255.22.camel@oracle.com>
	<5491D659.1090703@elyograg.org>	<1418849513.3293.3.camel@oracle.com>
	<5494D0D7.2010606@elyograg.org>	<1419170477.6868.1.camel@oracle.com>
	<5499AA73.9090003@elyograg.org>	<1419357347.3128.1.camel@oracle.com>
	<5499D8F9.9020607@elyograg.org>
	<1419934339.3250.1.camel@oracle.com>
	<54A2ECFC.1010401@elyograg.org>
Message-ID: <54A321F5.7060209@oracle.com>

Shawn,

There are 10 Full gcs, each takes about 2-5 seconds.  The live data set 
after full gc is ~2g.  The heap size expanded from 4g to 6g around 
45,650 sec.

As Thomas noticed, there are a lot of humongous objects (each of about 
2m size).  some of them can be cleaned after marking.  If you can not 
move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid 
of the humongous objects.

Thanks,
Jenny

On 12/30/2014 10:20 AM, Shawn Heisey wrote:
> On 12/30/2014 3:12 AM, Thomas Schatzl wrote:
>> Great to hear about field experience with G1 - a late Christmas present
>> for us particularly because they are good.
>>
>>> It looks like parallel reference
>>> processing and waiting for a later Java 7 release were the secrets to
>>> using G1 effectively.
>> You really might want to try 8u40: the small logs you provided indicate
>> that there is at least some amount of large object allocation going on
>> ("occupancy higher than threshold [...] cause: G1 Humongous
>> Allocation"). 8u40 added some special handling for those which allows
>> fast and cheap reclamation of these in some cases, which improves
>> throughput by decreasing GC frequency.
> I do have plans to roll out some Java 8 servers for a new project, and
> ultimately I expect we will upgrade to Java 8 for the existing servers,
> but it won't be an immediate thing.
>
>> Nothing worrying I think given these logs, but these
>> allocations/messages seem frequent enough that it could be considered
>> useful.
>>
>>> After I've got a few hours (and ultimately a few days) of runtime on the
>>> new settings, I will grab the log and graph it again.
>> Would be really nice to have them for analysis.
>>
>> Maybe they could be used to tweak the threshold that starts concurrent
>> cycles to reduce the number of GCs.
> I've now got almost a full week of runtime on these new settings.
> Here's the log:
>
> https://www.dropbox.com/s/yla4le5l5jrhiir/gc-idxa1-g1-7u72-with-refproc-one-week.zip?dl=0
>
> Trying to graph this log with gcviewer-1.34.jar, it found five entries
> in the log that it didn't know how to deal with.  The times on these
> lines do look fairly significant, and I assume that they are probably
> not in the resulting graph.
>
> INFO [DataReaderFacade]: GCViewer version 1.34 (2014-11-30T14:40:14+0100)
> INFO [DataReaderFactory]: File format: Sun 1.6.x G1 collector
> INFO [DataReaderSun1_6_0G1]: Reading Sun 1.6.x / 1.7.x G1 format...
> WARNING [DataReaderSun1_6_0G1]:
> com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
> 'M->' Line 47280:  5987M->1658M(6144M), 2.3640370 secs]
> WARNING [DataReaderSun1_6_0G1]:
> com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
> 'M->' Line 155388:  2928M->1721M(6104M), 2.7344030 secs]
> WARNING [DataReaderSun1_6_0G1]:
> com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
> 'M->' Line 190388:  2615M->1550M(6018M), 2.3079810 secs]
> WARNING [DataReaderSun1_6_0G1]:
> com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
> 'M->' Line 392918:  6003M->1602M(6138M), 2.1626330 secs]
> WARNING [DataReaderSun1_6_0G1]:
> com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type:
> 'M->' Line 397195:  6031M->1707M(6138M), 3.0746840 secs]
> INFO [DataReaderSun1_6_0G1]: Done reading.
>
> What it DID graph looks fairly good, but there are a handful of long
> collections in there.  Only one of those longer collections looked like
> a long enough pause that it would trigger a failed load balancer health
> check (every five seconds, with a 4990 millisecond timeout), but most of
> them are long enough that a user would definitely notice the delay on a
> single search.  That probably wouldn't be enough of a problem for them
> to lodge a complaint or decide that the site sucks, because the search
> would be fast on the next query.  The overall graph shows that a typical
> collection happens *very* quickly, so perhaps those few outliers are not
> enough of a problem to cause me much concern.  If Java 8 can smooth down
> those rough edges, I think we have a clear winner.  Even with Java 7, I
> am very excited about these results.
>
> Are there any alternate tools for producing nice graphs from GC logs,
> tools that can understand everything in a log from a modern JVM?
>
> Thanks,
> Shawn
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From java at elyograg.org  Wed Dec 31 00:29:39 2014
From: java at elyograg.org (Shawn Heisey)
Date: Tue, 30 Dec 2014 17:29:39 -0700
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <54A321F5.7060209@oracle.com>
References: <54906C34.8080408@elyograg.org>	<1418831456.3255.22.camel@oracle.com>	<5491D659.1090703@elyograg.org>	<1418849513.3293.3.camel@oracle.com>	<5494D0D7.2010606@elyograg.org>	<1419170477.6868.1.camel@oracle.com>	<5499AA73.9090003@elyograg.org>	<1419357347.3128.1.camel@oracle.com>	<5499D8F9.9020607@elyograg.org>	<1419934339.3250.1.camel@oracle.com>	<54A2ECFC.1010401@elyograg.org>
	<54A321F5.7060209@oracle.com>
Message-ID: <54A34373.9000509@elyograg.org>

On 12/30/2014 3:06 PM, Yu Zhang wrote:
> There are 10 Full gcs, each takes about 2-5 seconds.  The live data set
> after full gc is ~2g.  The heap size expanded from 4g to 6g around
> 45,650 sec.
> 
> As Thomas noticed, there are a lot of humongous objects (each of about
> 2m size).  some of them can be cleaned after marking.  If you can not
> move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid
> of the humongous objects.

Those huge objects may be Solr filterCache entries.  Each of my large
Solr indexes is over 16 million documents.  Because a filterCache entry
is a bitset representing those documents, it would be about 16.3 million
bits in length, or approximately 2 MB.  It could be other things --
Lucene handles a bunch of other things in large byte arrays, though I'm
not very familiar with those internals.

I will try the option you have indicated.

My index updating software does indexing once a minute.  Once an hour,
larger processes are done, and once a day, one of the large indexes is
optimized, which likely generates a lot of garbage in a very short time.

Thanks,
Shawn

From thomas.schatzl at oracle.com  Wed Dec 31 14:19:05 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 31 Dec 2014 15:19:05 +0100
Subject: G1 with Solr - thread from dev@lucene.apache.org
In-Reply-To: <54A34373.9000509@elyograg.org>
References: <54906C34.8080408@elyograg.org>
	<1418831456.3255.22.camel@oracle.com>	<5491D659.1090703@elyograg.org>
	<1418849513.3293.3.camel@oracle.com>	<5494D0D7.2010606@elyograg.org>
	<1419170477.6868.1.camel@oracle.com>	<5499AA73.9090003@elyograg.org>
	<1419357347.3128.1.camel@oracle.com>	<5499D8F9.9020607@elyograg.org>
	<1419934339.3250.1.camel@oracle.com>	<54A2ECFC.1010401@elyograg.org>
	<54A321F5.7060209@oracle.com> <54A34373.9000509@elyograg.org>
Message-ID: <1420035545.3277.2.camel@oracle.com>

Hi Shawn,

On Tue, 2014-12-30 at 17:29 -0700, Shawn Heisey wrote:
> On 12/30/2014 3:06 PM, Yu Zhang wrote:
> > There are 10 Full gcs, each takes about 2-5 seconds.  The live data set
> > after full gc is ~2g.  The heap size expanded from 4g to 6g around
> > 45,650 sec.
> > 
> > As Thomas noticed, there are a lot of humongous objects (each of about
> > 2m size).  some of them can be cleaned after marking.  If you can not
> > move to jdk8, can you try -XX:G1HeapRegionSize=8m? This should get rid
> > of the humongous objects.

-XX:G1HeapRegionSize=4M should be sufficient: all the objects I have
seen are slightly smaller than 2M, which corresponds to Shawn's
statement about having around 16.3M bits in length.

With -Xms4G -Xmx6G the default region size is 2M, not 4M. Using
-XX:G1HeapRegionSize=8M seems overkill.

> Those huge objects may be Solr filterCache entries.  Each of my large
> Solr indexes is over 16 million documents.  Because a filterCache entry
> is a bitset representing those documents, it would be about 16.3 million
> bits in length, or approximately 2 MB.  It could be other things --
> Lucene handles a bunch of other things in large byte arrays, though I'm
> not very familiar with those internals.
> 
> I will try the option you have indicated.

I agree with Jenny that we should try increasing heap region size
slightly first.

> My index updating software does indexing once a minute.  Once an hour,
> larger processes are done, and once a day, one of the large indexes is
> optimized, which likely generates a lot of garbage in a very short time.

Just fyi, the problem with these large byte arrays is that with 7uX, G1
cannot reclaim them during young GC but needs to wait for a complete
marking cycle. If that takes too long (longer than the next young GC
occurs), the next young GC may not have enough space to complete the GC,
potentially falling back to the mentioned full gcs.
That seems to happen a few times.

There are two other options that could be tried to improve the situation
(although I think increasing the heap region size should be sufficient),
that is

 -XX:-ResizePLAB

which decreases the amount of space G1 will waste during GC (it does so
for performance reasons, but the logic is somewhat flawed - I am
currently working on that).

The other is to cap the young gen size so that the amount of potential
survivors is smaller in the first place, e.g.

-XX:G1MaxNewSize=1536M // 1.5G seems reasonable without decreasing
throughput too much; a lot of these full gcs seem to appear after G1
using extremely large eden sizes.

This is most likely due to the spiky allocation behavior of the
application: i.e. long stretches of almost every object dying, and then
short bursts. Since G1 tunes itself to the former, it will simply try to
use too much eden size for these spikes.

But I recommend first seeing the impact of the increase in region size.

Thanks,
  Thomas