High termination times pre-concurrent cycle in G1

Fri Sep 2 10:10:54 UTC 2016

Jenny,

There's a log attached with those options.

Termination times are less with 2 workers but eden is much smaller (~2
GB instead of ~7) and evac pauses are more frequent. So it's a nice
improvement but not ideal. Also it appears to me that the 2 worker
case shows that work distribution appears to degenerate over time
(termination attempts increases) which mirrors what I perceive in the
cases with higher worker counts.

Thanks for your help so far!

William

On Fri, Sep 2, 2016 at 1:34 AM, yu.zhang at oracle.com <yu.zhang at oracle.com> wrote:
> William,
>
> I think it might be related to humongous objects. Can you do some
> experiments
>
> -XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousObjects
>
> Can you remove -XX:G1LogLevel=finest?
>
> Based on the output, can adjust the heap region size.
> But I think the termination is less with 2 worker threads.
>
> Thanks
> Jenny
>
> On 09/01/2016 08:12 AM, William Good wrote:
>>
>> Yep, the marking cycle seems to help. I just don't know why. Objects
>> in old regions should die very infrequently, as everything produced
>> either survives indefinitely or is a byproduct of loading or
>> evaluation (a very fast operation,especially when compared to
>> frequency of evac pause * number of survivor regions). Thus a mark
>> cycle shouldn't reveal much to be collected in old regions, and my
>> understanding is that all the survivor spaces are marked+evac'd on
>> each evac pause.
>>
>> Tried first with 12 workers (happens to be the number of physical
>> cores on my machine) and got the same pathological behavior. Then
>> tried with 2 and still see large termination time increases. Log file
>> attached.
>>
>> William
>>
>> On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com
>> <yu.zhang at oracle.com> wrote:
>>>
>>> It seems that after marking (clean up), the termination time drops. Maybe
>>> that is why you need a very low ihop so that you can have more marking
>>> cycle.
>>>
>>> The work distribution seems fine. But system time is high. Maybe some
>>> lock
>>> contention.
>>>
>>> I would agree to try lowering the gc threads, -XX:ParallelGCThreads=<n>
>>>
>>> Jenny
>>>
>>>
>>> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote:
>>>
>>> William,
>>>
>>> Have you tried running with a lower number (than the current 18) of
>>> parallel
>>> workers?
>>>
>>> On Tuesday, August 30, 2016, William Good <bkgood at gmail.com> wrote:
>>>>
>>>> I've been experiencing an issue in a production application using G1
>>>> for quite some time over a handful of 1.8.0 builds. The application is
>>>> relatively simple: it spends about 60s reading some parameters from
>>>> files on disk, and then starts serving web requests which merge some
>>>> input with those parameters, performs some computation and returns a
>>>> result. We're aiming to keep max total request time (as seen by remote
>>>> hosts) below 100 ms but from previous experience with parnew and cms
>>>> (and g1 on previous projects, for that matter), I didn't anticipate
>>>> this being a problem.
>>>>
>>>> The symptoms are an ever-increasing time spent in evacuation pauses,
>>>> and high parallel worker termination times stick out. With the
>>>> recommended set of G1 settings (max heap size and pause time target),
>>>> they increase sharply until I start seeing 500ms+ pause times and have
>>>> to kill the JVM.
>>>>
>>>> I found some time ago that first forcing a bunch of full GCs with
>>>> System.gc() at the phase (load -> serve) change and then forcing
>>>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1
>>>> seems to mitigate the problem. I'd prefer to have to do neither, as
>>>> the former makes redeployments very slow and the latter adds a couple
>>>> of neighboring 40ms pauses for remark and cleanup pauses that aren't
>>>> good for request time targets.
>>>>
>>>> I'm attaching a log file that details a short run, with the phase
>>>> change at about 60s from start. After a few evacuation pauses, one
>>>> lasts 160ms with nearly 100-120ms spent in parallel workers'
>>>> 'termination'. After this, a concurrent cycle runs and everything goes
>>>> back to normal. java params are at the top of the file.
>>>>
>>>> Generally this happens over a much longer period of time (and
>>>> especially if I haven't given the low
>>>> -XX:InitiatingHeapOccupancyPercent value) and over many different
>>>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty
>>>> dual-socket Xeon box with 128GB of RAM on CentOS 7.
>>>>
>>>> I'd be more than happy to hear any ideas on what's going on here and
>>>> how it could be fixed.
>>>>
>>>> Best,
>>>> William
>>>
>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2workers_tracehuge.log.gz
Type: application/x-gzip
Size: 58508 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160902/e68926a7/2workers_tracehuge.log-0001.gz>