Tracking Memory Consumption Metrics With JMH What&How
Kirk Pepperdine
kirk at kodewerk.com
Wed Apr 5 11:56:24 UTC 2017
Hi Jens,
I will humbly disagree with your dropping run times to 20 seconds and restarting JVM’s often. HotSpot warmup can take several minutes even for the simplest of benchmarks.
Kind regards,
Kirk
> After some dead ends, I went in the other direction. A bit further
> explanation:
>
> To get a practical memory consumption metric we need to look at the committed
> memory or what the OS is telling us.
>
> In contrast to whole applications or complex scenarios, which may have
> different phases, the micro benchmark is doing repeatedly exactly the
> identical operations, thus modeling a single application phase. For example,
> the used benchmark with the zipfian access pattern yields exactly the same
> cache hit rate for one implementation. Citing from Dr. Cliffs' Java One
> presentation: "Steady-state throughput after warmup is key." With a good
> designed benchmark the runtime is less critical.
>
> Given the fact that absolute time is limited, my experiments showed that
> lowering benchmark runtime (iteration time in JMH lingo) improved
> accuracy. Instead of doing long benchmark runs, it is better to restart the
> JVM more often. The reason for this is that you may hit the tripping point of
> a GC expansion by random and end up with different peak memory consumption for
> a JVM lifetime. To keep outliers in check, five JVM restarts gave good results.
> At the moment I ended up with 20 seconds iteration time. The confidence
> interval shows that this setup is working quite accurately. G1 is smoother in
> this regard.
>
> Actually, that is an interesting result, which I didn't put prominently in the
> conclusions. Updated.
>
> Sidenode: It proved a good decision to consequently add the confidence to the
> graph, instead of just look at the numbers occasionally. Ugly looking graphs
> enforce actions to improve the accuracy immediately.
>
>> We also
>> look at about a half a dozen additional metrics but you can’t get to them
>> directly from SA counters. The one metric that is missing and can have an
>> impact on GC performance is the rate of mutations.
>
> Good point. JMH is able to record relevant garbage collector statistics via
> the HotspotMemoryMBean. The data is there. Digging this deep is not my major
> objective right now. See below. If someone likes to team up and use my
> benchmark scenarios to analyze GC effects, that's very welcome. I can share
> the JSON data with the GC statics of my benchmark runs.
>
>> You should be able to
>> get a handle on this value by looking at the refinement queue in G1 or card
>> table updates with the generational collectors. This also will not be
>> accurate but it is a measure. Unfortunately you will have to instrument the
>> JVM to get these values but for benching it’s an acceptable option. Another
>> cheap way to estimate allocation is to count the number of collections and
>> multiply that by the size of Eden. Yes, it doesn’t take into consideration
>> of waste or allocations directly in tenured but the waste should be 2% or
>> less and in a bench you should be able to control or at least understand
>> the number of allocations in tenured. You can contact me offline is you’d
>> like a copy of our GC log analysis tooling.
>
> I presented already the allocation rate. The article is missing the reference
> on how it is extracted. Updated. The metrics memUsed/memTotal_max is
> actually calculated from GC event notifications, which is the same information
> you get in the GC logs.
>
> In general, there are two different directions with the (micro-)benchmarking:
>
> Evaluation: get a few scalar values about throughput, efficiency and resource
> usage in a reproducible way. For example, for a compression codec you will
> look at: data throughput at maximum CPU utilization, compression efficiency and
> memory consumption
>
> Tuning: Get insights and find possible tuning options. You will collect lots of
> performance counters, do profiling etc.
>
> Tuning the GC, or, the "GC friendliness" of code is one tuning opportunity of
> many. When looking for tuning opportunities I typically switch on profiling and
> record CPU performance counters as well, and try to look at the whole thing.
> This means, when I am on the "tuning track", the performance data is not
> representative since the profiling is biasing. Typically I also go with less
> iterations, since I don't need the accuracy.
>
> My focus in the article and at the moment is on the evaluation part. At the
> end it boils down to how many bucks we need to spend on power, CPUs and
> memory. That said, the allocation rate is already an "irrelevant" metric for
> evaluation but important for deeper analysis.
>
> Also, as it comes to GC related tuning and analysis, I have not so much
> motivation, yet. cache2k, which I am the author of, has the lowest allocation
> rate of all ;)
>
> However, your feedback is highly appreciated. I want to look deeper into GC
> effects in the future.
>
> Cheers,
>
> Jens
>
> --
> "Everything superfluous is wrong!"
>
> // Jens Wilke - headissue GmbH - Germany
> \// https://headissue.com
More information about the jmh-dev
mailing list