Tracking Memory Consumption Metrics With JMH What&How

Mon Apr 3 10:00:56 UTC 2017

On Mittwoch, 29. März 2017 13:42:06 ICT Kirk Pepperdine wrote:
> Hi Jens,

Kirk,

thanks for the feedback!

> Interesting….. I find a much less intrusive way of looking at memory is to
> simply use the data in the GC logs. Yes, they also suffer from inaccuracies
> but over a long enough benchmark the errors settle out in the dust.

After some dead ends, I went in the other direction. A bit further 
explanation:

To get a practical memory consumption metric we need to look at the committed
memory or what the OS is telling us.

In contrast to whole applications or complex scenarios, which may have 
different phases,  the micro benchmark is doing repeatedly exactly the 
identical operations, thus modeling a single application phase. For example, 
the used benchmark with the zipfian access pattern yields exactly the same 
cache hit rate for one implementation. Citing from Dr. Cliffs' Java One 
presentation: "Steady-state throughput after warmup is key." With a good 
designed benchmark the runtime is less critical.

Given the fact that absolute time is limited, my experiments showed that
lowering benchmark runtime (iteration time in JMH lingo) improved 
accuracy. Instead of doing long benchmark runs, it is better to restart the 
JVM more often. The reason for this is that you may hit the tripping point of 
a GC expansion by random and end up with different peak memory consumption for 
a JVM lifetime. To keep outliers in check, five JVM restarts gave good results. 
At the moment I ended up with 20 seconds iteration time. The confidence 
interval shows that this setup is working quite accurately. G1 is smoother in 
this regard.

Actually, that is an interesting result, which I didn't put prominently in the 
conclusions. Updated.

Sidenode: It proved a good decision to consequently add the confidence to the 
graph, instead of just look at the numbers occasionally. Ugly looking graphs 
enforce actions to improve the accuracy immediately.

> We also
> look at about a half a dozen additional metrics but you can’t get to them
> directly from SA counters. The one metric that is missing and can have an
> impact on GC performance is the rate of mutations. 

Good point. JMH is able to record relevant garbage collector statistics via 
the HotspotMemoryMBean. The data is there. Digging this deep is not my major 
objective right now. See below. If someone likes to team up and use my 
benchmark scenarios to analyze GC effects, that's very welcome. I can share 
the JSON data with the GC statics of my benchmark runs.

> You should be able to
> get a handle on this value by looking at the refinement queue in G1 or card
> table updates with the generational collectors. This also will not be
> accurate but it is a measure. Unfortunately you will have to instrument the
> JVM to get these values but for benching it’s an acceptable option. Another
> cheap way to estimate allocation is to count the number of collections and
> multiply that by the size of Eden. Yes, it doesn’t take into consideration
> of waste or allocations directly in tenured but the waste should be 2% or
> less and in a bench you should be able to control or at least understand
> the number of allocations in tenured. You can contact me offline is you’d
> like a copy of our GC log analysis tooling.

I presented already the allocation rate. The article is missing the reference 
on how it is extracted. Updated. The metrics memUsed/memTotal_max is
actually calculated from GC event notifications, which is the same information 
you get in the GC logs.

In general, there are two different directions with the (micro-)benchmarking:

Evaluation: get a few scalar values about throughput, efficiency and resource 
usage in a reproducible way. For example, for a compression codec you will 
look at: data throughput at maximum CPU utilization, compression efficiency and 
memory consumption

Tuning: Get insights and find possible tuning options. You will collect lots of
performance counters, do profiling etc.

Tuning the GC, or, the "GC friendliness" of code is one tuning opportunity of 
many. When looking for tuning opportunities I typically switch on profiling and 
record CPU performance counters as well, and try to look at the whole thing.
This means, when I am on the "tuning track", the performance data is not 
representative since the profiling is biasing. Typically I also go with less 
iterations, since I don't need the accuracy.

My focus in the article and at the moment is on the evaluation part. At the 
end it boils down to how many bucks we need to spend on power, CPUs and 
memory. That said, the allocation rate is already an "irrelevant" metric for 
evaluation but important for deeper analysis.

Also, as it comes to GC related tuning and analysis, I have not so much 
motivation, yet. cache2k, which I am the author of, has the lowest allocation 
rate of all ;) 

However, your feedback is highly appreciated. I want to look deeper into GC 
effects in the future.

Cheers,

Jens

-- 
"Everything superfluous is wrong!"

   // Jens Wilke - headissue GmbH - Germany
 \//  https://headissue.com