JFR causes spike in HeapMemoryAfterGC in prod ssystems

Fri Mar 1 13:49:49 UTC 2024

Thanks Marcus and Erik.

HeapMemoryAfterGC - Our service framework in Amazon emits a JMX metrics for
HeapMemoryAfterGC - i.e. the percent occupancy of the part of the Java
heap *that
excludes Eden* by both live and garbage objects *after the last collection
that affected that part of the Java heap*. It is *not* an instantaneously
measured value: it changes *only after* a collection. HeapMemoryAfterGCUse
stays fairly constant.

Heap memory Xmx and Xms values are set to 24G on the server The rise is
from 10% to 30%. When I disable the JFR VM params, the HeapMomoryAfterGCuse
falls bacl to 5-10%. So its the enables of JFR causing the issue.

Do you think additional JVM parameters could cause the issue though I have
capped the memory to 60MB.
-XX:StartFlightRecording=name=SomeServiceJFR,disk=false,path-to-gc-roots=false
-XX:FlightRecorderOptions=stackdepth=256,memorysize=60M,numglobalbuffers=2,globalbuffersize=30M,old-object-queue-size=64

On Sat, Feb 24, 2024 at 6:02 PM Marcus Hirt <marcus at hirt.se> wrote:

> Hi Gaurav,
>
> I’ve invited you to the JMC slack.
>
>
>
> Kind regards,
>
> Marcus
>
>
>
> *From:* jmc-dev <jmc-dev-retn at openjdk.org> *On Behalf Of *Erik Gahlin
> *Sent:* Saturday, 24 February 2024 11:09
> *To:* Gaurav Gupta <geniusgaurav27 at gmail.com>; hotspot-jfr-dev at openjdk.org;
> jmc-dev at openjdk.org
> *Subject:* Re: JFR causes spike in HeapMemoryAfterGC in prod ssystems
>
>
>
> Hi Guarav,
>
>
>
> >  it is claimed to be safe in prod systems with continuous JFR profiling.
>
>
>
> The overhead should be less than 1% by default (-XX:StartFlightRecording,
> no other options) and it should not crash, cause memory leaks etc. if you
> run it for a long time.
>
>
>
> I don't know where the HeapMemoryAfterGC metric comes from. Is it
> something shown in JMC? We don't have targets for memory usage. That said,
> JFR should probably not use more than 50 MB of the Java heap, so if 30% is
> normal or not depends on how large heap you are using.
>
>
>
> There is a slack channel for JMC developers. I don't manage it and I'm not
> sure if this is really a bug.
>
> https://wiki.openjdk.org/display/jmc/Contributing
>
>
>
> Best regards,
>
> Erik
>
>
> ------------------------------
>
> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org> on behalf of
> Gaurav Gupta <geniusgaurav27 at gmail.com>
> *Sent:* Friday, February 23, 2024 6:26 AM
> *To:* hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>;
> jmc-dev at openjdk.org <jmc-dev at openjdk.org>
> *Subject:* JFR causes spike in HeapMemoryAfterGC in prod ssystems
>
>
>
> Hi Team,
>
>
>
> I am Gaurav Gupta, Principal Engineer at Amazon India (gagup at amazon.com).
> My team is trying to enable JFR on production systems for studying any
> issues using custom events in JMC. Also we plan to study the impact of code
> changes in our application on system performance by studying JFR dumps
> using JMC and Flamescope (sub-second profiling).
>
>
>
> We tried enabling JFR in our prod system by adding following JVM args in
> our application startup (we have not yet added any custom event in our code:
>
> -XX:+FlightRecorder
> -XX:StartFlightRecording=name=SomeServiceJFR,disk=false,path-to-gc-roots=false
> -XX:FlightRecorderOptions=stackdepth=256,memorysize=60M,numglobalbuffers=2,globalbuffersize=30M,old-object-queue-size=64
>
> (The reason we turned off the disk parameter is to avoid disk space issues
> during spiky traffic. We plan to take JFR dump on demand basis by running
> JFR.dump command line option to study application behavior and keep JFR
> recording on all the time).
>
>
>
> We saw the system performance metrics on HeapMemoryAfterGC rising from 5%
> to 30% (on our test fleet) after starting our application with above JVM
> args (i.e. with JFR enabled). As per our reading about JFR, it is claimed
> to be safe in prod systems with continuous JFR profiling. But the rise of
> the HeapMemory use raised concern.
>
> I seek your advice in this regard, if we are missing anything (wrt
> parameter configuration above) or are there other options to try without
> affecting the prod system.
>
>
>
> Also, is there a slack channel that I join for live discussion in this
> regard? How can I get added?
>
>
>
> PS: I won't be able to share the JFR dump file outside Amazon as it
> records events and stack traces for our proprietary software. But I am
> happy to come to a meeting on slack or other means and discuss this. Slack
> email: gagup at amazon.com
>
>
> --
>
> Best regards,
> *Gaurav Gupta*
>
>
> -------------------------------------------------------------------------------------------------------------------------------
> *"Perfection is achieved not when there is nothing more to add,but rather
> when there is nothing more to take away."*
>

-- 
Best regards,
*Gaurav Gupta*

-------------------------------------------------------------------------------------------------------------------------------
*"Perfection is achieved not when there is nothing more to add,but rather
when there is nothing more to take away."*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20240301/978ccf15/attachment-0001.htm>