Trying to understand ZGC

Thu Nov 29 07:40:18 UTC 2018

Hi,

On 11/28/18 8:09 PM, Stefan Reich wrote:
> Hi Per!
> 
> On Tue, 13 Nov 2018 at 20:22, Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     The RSS accounting on Linux isn't always telling the complete truth and
>     it can even vary depending on if you're using small or large pages. ZGC
>     does heap multi-mapping, which means it will map the same heap
>     memory in
>     three different locations in the virtual address space. When using
>     small
>     pages, Linux isn't clever enough to detect that it's the same memory
>     being mapped multiple times, and so it accounts for each mapping as if
>     it was new/different, inflating the RSS by 3x. This typically doesn't
>     happen when using large pages (-XX:+UseLargePages).
> 
> 
> Thanks. I would call this an actual bug in Linux then. Counting memory 
> twice is really not OK.

Yes, I would also like to call it a bug. I assume the problem is that 
figuring out if a new mapping is the same as an existing one it 
potentially really expensive (like traverse all mappings to see if 
there's a match). When using large pages, the memory is accounted to the 
hugetlbfs inode rather than the process itself, which makes is easier to 
get the accounting right.

> 
> Hm... are large pages really problematic as suggested here? 
> https://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html

Using -XX:+UseLargePages is typically good for both throughput and 
latency (use of transparent huge pages is a different story though). The 
main problem/inconvenience is that you need to reserve huge pages up 
front, i.e. tie up memory in the huge page pool, so it's less flexible 
in that sense.

> 
> 
>      >
>      > When turning on GC notifications, I see (sometimes):
>      >
>      >    GC cause:  Allocation Rate (360 ms)
>      >    Collector: ZGC
>      >    Changes:   ZHeap: -16383 K, CodeHeap 'profiled nmethods': 85
>     K, Metaspace: 1 K
>      >
>      > and more often:
>      >
>      >   GC cause:  Proactive (147 ms)
>      >    Collector: ZGC
>      >    Changes:   ZHeap: -180223 K, CodeHeap 'profiled nmethods': 1
>     K, CodeHeap 'non-profiled nmethods': 1 K, Metaspace: 1 K, CodeHeap
>     'non-nmethods': 12 K
>      >
>      > Does this mean stop-the-world GC pauses are occurring, or is my
>     application not paused?
> 
>     This is all normal. Each ZGC cycle has three short pauses (each of them
>     should be below 10ms). If you enable detailed GC logging with
>     -Xlog:gc*:gc.log you'll see more details on exactly how long the pauses
>     are, and a bunch of other data points.
> 
> 
> I still don't understand... are the GC pauses of 360/147 ms 
> stop-the-world pauses or just the duration of a concurrent GC cycle? 
> (I'm just printing all  GarbageCollectionNotificationInfo objects I get 
> from the pertinent MX beans.)

The time you see there (e.g. 360 ms) is the time for a complete GC 
cycle, i.e. the sum of all pauses and all concurrent phases. This time 
is dominated by the concurrent phases, and your pauses should be on the 
order of a few milliseconds.

Use -Xlog:gc*:gc.log to print more detailed GC information into a log, 
then you'll see all the details on what's going on.

> 
> 
>     For more information on ZGC, how to tune, how to interpret logs,
>     internals, etc., I'd recommend having a look at some of the slides
>     and/or videos available here:
> 
>     https://wiki.openjdk.java.net/display/zgc/Main
> 
> 
> Thanks.
> 
> For now I think I'll stick to G1 as it has tolerable pauses (<50ms, 
> roughly, unless I call System.gc()). I do have to call System.gc() 
> sometimes in order to return memory to the OS.

A patch to have ZGC (optionally) return memory to the OS exists, but it 
has not been upstreamed yet, but it will eventually get there. And you 
will not need to do a System.gc() to make that happen (just as that is 
not needed in the latest version of G1).

> 
> I'm focusing on desktop use where my goal is <1GB total process size. I 
> assume for ZGC I would need to reserve more slack than with G1 in order 
> to get its full advantages?

You could be right, but it all depends on the allocation rate of your 
application (which will dictate the heap headroom needed by ZGC) and the 
shape of the object graph on the heap (which will dictate the amount of 
memory needed by G1's remember-sets).

cheers,
Per

> 
> Many greetings,
> Stefan