heap memory usage increasing over time with a fixed live set size

Wed Oct 9 12:52:25 UTC 2019

It seems not having a frequent enough class unloading is indeed the culprit!
I see that when a degenerated gc occasionally happens due to allocation failure, class unloading gets the memory back to baseline level.
The logs also show that there's a serious amount of strings being removed from the string table, probably due to Jackson interning field names.
So basically what happens is that memory usage goes up and concurrent gc time skyrockets until a degenerated gc kicks in and gets everything back to normal.

With -XX:+ClassUnloadingWithConcurrentMark the memory usage looks stable, though pauses are longer as expected.

I must say I'm now advising people to move from zgc to Shenandoah, as it's more promising for our use case:
1) Heaps that can benefit from compressed oops.
2) Bounded pacing time which improves visibility.
3) Better integration with jmx beans exposing gc metrics which also improves visibility.

So you've done a great work!
Once Shenandoah supports concurrent class unloading it would be even better ��

________________________________
From: Aleksey Shipilev <shade at redhat.com>
Sent: Tuesday, October 8, 2019 5:51 PM
To: Amir Hadadi <amirhadadi at hotmail.com>; shenandoah-dev at openjdk.java.net <shenandoah-dev at openjdk.java.net>
Subject: Re: heap memory usage increasing over time with a fixed live set size

On 10/8/19 4:34 PM, Amir Hadadi wrote:
> I've encountered a problem of increasing heap usage with Shenandoah. For a workload with a non
> increasing live set, the used memory as reported in the logs at the end of a gc cycle is
> constantly increasing. The total time spent in concurrent gc as reported by
> GarbageCollectorMXBean::getCollectionTime is also constantly increasing, starting at 5% of the
> time going up to 60% and more. The memory usage is going up from 500MB to 2500MB over a period
> of 2-3 days, again without the live set materially changing. At this point I'm forced to restart
> the nodes.

Are you seeing this only with Shenandoah? This looks more like a very slow Java-level memory leak,
to be honest.

I think you first need to confirm the live data set indeed does not change. Doing two heap dumps,
one in the early stages (when heap is not full), and the later stages (when heap is full) would tell
what exactly is in the heap. This might provide a clue if that is indeed the application-level
memory leak, or something that Shenandoah does not clean up (like weak references, old classes, etc).

> I'm using an OpenJDK 12.0.2 binary from AdoptOpenJdk in a container based on alpine linux.
> No special command line flags, I'm only setting the number of concurrent gc threads to 2.
> Any pointers on what can be the issue and how to debug it will be greatly appreciated!

Can you post the link to the complete GC logs, as this unfolds, somewhere? Unfortunately, a single
point in GC log does not tell us much.

When you reach 2500MB, does invoking explicit GC (i.e. "jcmd <pid> GC.run") help to unclutter the heap?

I have a hunch that special cases like Strings from StringTable, classes, etc. are not reclaimed
with the regular concurrent cycles, unless you do -XX:+ClassUnloadingWithConcurrentMark.

--
Thanks,
-Aleksey