High termination times pre-concurrent cycle in G1

Thu Sep 1 15:12:07 UTC 2016

Yep, the marking cycle seems to help. I just don't know why. Objects
in old regions should die very infrequently, as everything produced
either survives indefinitely or is a byproduct of loading or
evaluation (a very fast operation,especially when compared to
frequency of evac pause * number of survivor regions). Thus a mark
cycle shouldn't reveal much to be collected in old regions, and my
understanding is that all the survivor spaces are marked+evac'd on
each evac pause.

Tried first with 12 workers (happens to be the number of physical
cores on my machine) and got the same pathological behavior. Then
tried with 2 and still see large termination time increases. Log file
attached.

William

On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com
<yu.zhang at oracle.com> wrote:
> It seems that after marking (clean up), the termination time drops. Maybe
> that is why you need a very low ihop so that you can have more marking
> cycle.
>
> The work distribution seems fine. But system time is high. Maybe some lock
> contention.
>
> I would agree to try lowering the gc threads, -XX:ParallelGCThreads=<n>
>
> Jenny
>
>
> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote:
>
> William,
>
> Have you tried running with a lower number (than the current 18) of parallel
> workers?
>
> On Tuesday, August 30, 2016, William Good <bkgood at gmail.com> wrote:
>>
>> I've been experiencing an issue in a production application using G1
>> for quite some time over a handful of 1.8.0 builds. The application is
>> relatively simple: it spends about 60s reading some parameters from
>> files on disk, and then starts serving web requests which merge some
>> input with those parameters, performs some computation and returns a
>> result. We're aiming to keep max total request time (as seen by remote
>> hosts) below 100 ms but from previous experience with parnew and cms
>> (and g1 on previous projects, for that matter), I didn't anticipate
>> this being a problem.
>>
>> The symptoms are an ever-increasing time spent in evacuation pauses,
>> and high parallel worker termination times stick out. With the
>> recommended set of G1 settings (max heap size and pause time target),
>> they increase sharply until I start seeing 500ms+ pause times and have
>> to kill the JVM.
>>
>> I found some time ago that first forcing a bunch of full GCs with
>> System.gc() at the phase (load -> serve) change and then forcing
>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1
>> seems to mitigate the problem. I'd prefer to have to do neither, as
>> the former makes redeployments very slow and the latter adds a couple
>> of neighboring 40ms pauses for remark and cleanup pauses that aren't
>> good for request time targets.
>>
>> I'm attaching a log file that details a short run, with the phase
>> change at about 60s from start. After a few evacuation pauses, one
>> lasts 160ms with nearly 100-120ms spent in parallel workers'
>> 'termination'. After this, a concurrent cycle runs and everything goes
>> back to normal. java params are at the top of the file.
>>
>> Generally this happens over a much longer period of time (and
>> especially if I haven't given the low
>> -XX:InitiatingHeapOccupancyPercent value) and over many different
>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty
>> dual-socket Xeon box with 128GB of RAM on CentOS 7.
>>
>> I'd be more than happy to hear any ideas on what's going on here and
>> how it could be fixed.
>>
>> Best,
>> William
>
>
>
> --
> Sent from my phone
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2workers.log.gz
Type: application/x-gzip
Size: 80535 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160901/a101b9be/2workers.log-0001.gz>