[External] : Re: JFR causes spike in HeapMemoryAfterGC in prod ssystems

Fri Mar 1 15:10:36 UTC 2024

The options you have modified changes native memory, not the Java heap. That said, I would run JFR with the default configuration unless you experience issues.

5% of 24G is 1,2 GB. That doesn't sound reasonable, but I can't rule out JFR impacts GC heuristics somehow. You could do a Java heap dump and see if there are classes held by JFR. You could also run with NMT (native memory tracking) to get a more detailed view of what is happening from a memory perspective.

What JDK version are you using?

Erik
________________________________
From: Gaurav Gupta <geniusgaurav27 at gmail.com>
Sent: Friday, March 1, 2024 2:49 PM
To: Marcus Hirt <marcus at hirt.se>
Cc: Erik Gahlin <erik.gahlin at oracle.com>; hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>; jmc-dev at openjdk.org <jmc-dev at openjdk.org>
Subject: [External] : Re: JFR causes spike in HeapMemoryAfterGC in prod ssystems

Thanks Marcus and Erik.

HeapMemoryAfterGC - Our service framework in Amazon emits a JMX metrics for HeapMemoryAfterGC - i.e. the percent occupancy of the part of the Java heap that excludes Eden by both live and garbage objects after the last collection that affected that part of the Java heap. It is not an instantaneously measured value: it changes only after a collection. HeapMemoryAfterGCUse stays fairly constant.

Heap memory Xmx and Xms values are set to 24G on the server The rise is from 10% to 30%. When I disable the JFR VM params, the HeapMomoryAfterGCuse falls bacl to 5-10%. So its the enables of JFR causing the issue.

Do you think additional JVM parameters could cause the issue though I have capped the memory to 60MB.
-XX:StartFlightRecording=name=SomeServiceJFR,disk=false,path-to-gc-roots=false -XX:FlightRecorderOptions=stackdepth=256,memorysize=60M,numglobalbuffers=2,globalbuffersize=30M,old-object-queue-size=64

On Sat, Feb 24, 2024 at 6:02 PM Marcus Hirt <marcus at hirt.se<mailto:marcus at hirt.se>> wrote:

Hi Gaurav,

I’ve invited you to the JMC slack.

Kind regards,

Marcus

From: jmc-dev <jmc-dev-retn at openjdk.org<mailto:jmc-dev-retn at openjdk.org>> On Behalf Of Erik Gahlin
Sent: Saturday, 24 February 2024 11:09
To: Gaurav Gupta <geniusgaurav27 at gmail.com<mailto:geniusgaurav27 at gmail.com>>; hotspot-jfr-dev at openjdk.org<mailto:hotspot-jfr-dev at openjdk.org>; jmc-dev at openjdk.org<mailto:jmc-dev at openjdk.org>
Subject: Re: JFR causes spike in HeapMemoryAfterGC in prod ssystems

Hi Guarav,

>  it is claimed to be safe in prod systems with continuous JFR profiling.

The overhead should be less than 1% by default (-XX:StartFlightRecording, no other options) and it should not crash, cause memory leaks etc. if you run it for a long time.

I don't know where the HeapMemoryAfterGC metric comes from. Is it something shown in JMC? We don't have targets for memory usage. That said, JFR should probably not use more than 50 MB of the Java heap, so if 30% is normal or not depends on how large heap you are using.

There is a slack channel for JMC developers. I don't manage it and I'm not sure if this is really a bug.

https://wiki.openjdk.org/display/jmc/Contributing

Best regards,

Erik

________________________________

From: hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org<mailto:hotspot-jfr-dev-retn at openjdk.org>> on behalf of Gaurav Gupta <geniusgaurav27 at gmail.com<mailto:geniusgaurav27 at gmail.com>>
Sent: Friday, February 23, 2024 6:26 AM
To: hotspot-jfr-dev at openjdk.org<mailto:hotspot-jfr-dev at openjdk.org> <hotspot-jfr-dev at openjdk.org<mailto:hotspot-jfr-dev at openjdk.org>>; jmc-dev at openjdk.org<mailto:jmc-dev at openjdk.org> <jmc-dev at openjdk.org<mailto:jmc-dev at openjdk.org>>
Subject: JFR causes spike in HeapMemoryAfterGC in prod ssystems

Hi Team,

I am Gaurav Gupta, Principal Engineer at Amazon India (gagup at amazon.com<mailto:gagup at amazon.com>). My team is trying to enable JFR on production systems for studying any issues using custom events in JMC. Also we plan to study the impact of code changes in our application on system performance by studying JFR dumps using JMC and Flamescope (sub-second profiling).

We tried enabling JFR in our prod system by adding following JVM args in our application startup (we have not yet added any custom event in our code:

-XX:+FlightRecorder -XX:StartFlightRecording=name=SomeServiceJFR,disk=false,path-to-gc-roots=false -XX:FlightRecorderOptions=stackdepth=256,memorysize=60M,numglobalbuffers=2,globalbuffersize=30M,old-object-queue-size=64

(The reason we turned off the disk parameter is to avoid disk space issues during spiky traffic. We plan to take JFR dump on demand basis by running JFR.dump command line option to study application behavior and keep JFR recording on all the time).

We saw the system performance metrics on HeapMemoryAfterGC rising from 5% to 30% (on our test fleet) after starting our application with above JVM args (i.e. with JFR enabled). As per our reading about JFR, it is claimed to be safe in prod systems with continuous JFR profiling. But the rise of the HeapMemory use raised concern.

I seek your advice in this regard, if we are missing anything (wrt parameter configuration above) or are there other options to try without affecting the prod system.

Also, is there a slack channel that I join for live discussion in this regard? How can I get added?

PS: I won't be able to share the JFR dump file outside Amazon as it records events and stack traces for our proprietary software. But I am happy to come to a meeting on slack or other means and discuss this. Slack email: gagup at amazon.com<mailto:gagup at amazon.com>

--

Best regards,
Gaurav Gupta

-------------------------------------------------------------------------------------------------------------------------------
"Perfection is achieved not when there is nothing more to add,but rather when there is nothing more to take away."

--
Best regards,
Gaurav Gupta

-------------------------------------------------------------------------------------------------------------------------------
"Perfection is achieved not when there is nothing more to add,but rather when there is nothing more to take away."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20240301/5a26a804/attachment-0001.htm>