Tracking Memory Consumption Metrics With JMH What&How
Jens Wilke
jw_list at headissue.com
Mon Apr 3 10:00:56 UTC 2017
On Mittwoch, 29. März 2017 13:42:06 ICT Kirk Pepperdine wrote:
> Hi Jens,
Kirk,
thanks for the feedback!
> Interesting….. I find a much less intrusive way of looking at memory is to
> simply use the data in the GC logs. Yes, they also suffer from inaccuracies
> but over a long enough benchmark the errors settle out in the dust.
After some dead ends, I went in the other direction. A bit further
explanation:
To get a practical memory consumption metric we need to look at the committed
memory or what the OS is telling us.
In contrast to whole applications or complex scenarios, which may have
different phases, the micro benchmark is doing repeatedly exactly the
identical operations, thus modeling a single application phase. For example,
the used benchmark with the zipfian access pattern yields exactly the same
cache hit rate for one implementation. Citing from Dr. Cliffs' Java One
presentation: "Steady-state throughput after warmup is key." With a good
designed benchmark the runtime is less critical.
Given the fact that absolute time is limited, my experiments showed that
lowering benchmark runtime (iteration time in JMH lingo) improved
accuracy. Instead of doing long benchmark runs, it is better to restart the
JVM more often. The reason for this is that you may hit the tripping point of
a GC expansion by random and end up with different peak memory consumption for
a JVM lifetime. To keep outliers in check, five JVM restarts gave good results.
At the moment I ended up with 20 seconds iteration time. The confidence
interval shows that this setup is working quite accurately. G1 is smoother in
this regard.
Actually, that is an interesting result, which I didn't put prominently in the
conclusions. Updated.
Sidenode: It proved a good decision to consequently add the confidence to the
graph, instead of just look at the numbers occasionally. Ugly looking graphs
enforce actions to improve the accuracy immediately.
> We also
> look at about a half a dozen additional metrics but you can’t get to them
> directly from SA counters. The one metric that is missing and can have an
> impact on GC performance is the rate of mutations.
Good point. JMH is able to record relevant garbage collector statistics via
the HotspotMemoryMBean. The data is there. Digging this deep is not my major
objective right now. See below. If someone likes to team up and use my
benchmark scenarios to analyze GC effects, that's very welcome. I can share
the JSON data with the GC statics of my benchmark runs.
> You should be able to
> get a handle on this value by looking at the refinement queue in G1 or card
> table updates with the generational collectors. This also will not be
> accurate but it is a measure. Unfortunately you will have to instrument the
> JVM to get these values but for benching it’s an acceptable option. Another
> cheap way to estimate allocation is to count the number of collections and
> multiply that by the size of Eden. Yes, it doesn’t take into consideration
> of waste or allocations directly in tenured but the waste should be 2% or
> less and in a bench you should be able to control or at least understand
> the number of allocations in tenured. You can contact me offline is you’d
> like a copy of our GC log analysis tooling.
I presented already the allocation rate. The article is missing the reference
on how it is extracted. Updated. The metrics memUsed/memTotal_max is
actually calculated from GC event notifications, which is the same information
you get in the GC logs.
In general, there are two different directions with the (micro-)benchmarking:
Evaluation: get a few scalar values about throughput, efficiency and resource
usage in a reproducible way. For example, for a compression codec you will
look at: data throughput at maximum CPU utilization, compression efficiency and
memory consumption
Tuning: Get insights and find possible tuning options. You will collect lots of
performance counters, do profiling etc.
Tuning the GC, or, the "GC friendliness" of code is one tuning opportunity of
many. When looking for tuning opportunities I typically switch on profiling and
record CPU performance counters as well, and try to look at the whole thing.
This means, when I am on the "tuning track", the performance data is not
representative since the profiling is biasing. Typically I also go with less
iterations, since I don't need the accuracy.
My focus in the article and at the moment is on the evaluation part. At the
end it boils down to how many bucks we need to spend on power, CPUs and
memory. That said, the allocation rate is already an "irrelevant" metric for
evaluation but important for deeper analysis.
Also, as it comes to GC related tuning and analysis, I have not so much
motivation, yet. cache2k, which I am the author of, has the lowest allocation
rate of all ;)
However, your feedback is highly appreciated. I want to look deeper into GC
effects in the future.
Cheers,
Jens
--
"Everything superfluous is wrong!"
// Jens Wilke - headissue GmbH - Germany
\// https://headissue.com
More information about the jmh-dev
mailing list