Tracking Memory Consumption Metrics With JMH What&How

Wed Apr 5 11:56:24 UTC 2017

Hi Jens,

I will humbly disagree with your dropping run times to 20 seconds and restarting JVM’s often. HotSpot warmup can take several minutes even for the simplest of benchmarks.

Kind regards,
Kirk

> After some dead ends, I went in the other direction. A bit further 
> explanation:
> 
> To get a practical memory consumption metric we need to look at the committed
> memory or what the OS is telling us.
> 
> In contrast to whole applications or complex scenarios, which may have 
> different phases,  the micro benchmark is doing repeatedly exactly the 
> identical operations, thus modeling a single application phase. For example, 
> the used benchmark with the zipfian access pattern yields exactly the same 
> cache hit rate for one implementation. Citing from Dr. Cliffs' Java One
> presentation: "Steady-state throughput after warmup is key." With a good 
> designed benchmark the runtime is less critical.
> 
> Given the fact that absolute time is limited, my experiments showed that
> lowering benchmark runtime (iteration time in JMH lingo) improved 
> accuracy. Instead of doing long benchmark runs, it is better to restart the 
> JVM more often. The reason for this is that you may hit the tripping point of 
> a GC expansion by random and end up with different peak memory consumption for 
> a JVM lifetime. To keep outliers in check, five JVM restarts gave good results. 
> At the moment I ended up with 20 seconds iteration time. The confidence 
> interval shows that this setup is working quite accurately. G1 is smoother in 
> this regard.
> 
> Actually, that is an interesting result, which I didn't put prominently in the 
> conclusions. Updated.
> 
> Sidenode: It proved a good decision to consequently add the confidence to the 
> graph, instead of just look at the numbers occasionally. Ugly looking graphs 
> enforce actions to improve the accuracy immediately.
> 
>> We also
>> look at about a half a dozen additional metrics but you can’t get to them
>> directly from SA counters. The one metric that is missing and can have an
>> impact on GC performance is the rate of mutations. 
> 
> Good point. JMH is able to record relevant garbage collector statistics via 
> the HotspotMemoryMBean. The data is there. Digging this deep is not my major 
> objective right now. See below. If someone likes to team up and use my 
> benchmark scenarios to analyze GC effects, that's very welcome. I can share 
> the JSON data with the GC statics of my benchmark runs.
> 
>> You should be able to
>> get a handle on this value by looking at the refinement queue in G1 or card
>> table updates with the generational collectors. This also will not be
>> accurate but it is a measure. Unfortunately you will have to instrument the
>> JVM to get these values but for benching it’s an acceptable option. Another
>> cheap way to estimate allocation is to count the number of collections and
>> multiply that by the size of Eden. Yes, it doesn’t take into consideration
>> of waste or allocations directly in tenured but the waste should be 2% or
>> less and in a bench you should be able to control or at least understand
>> the number of allocations in tenured. You can contact me offline is you’d
>> like a copy of our GC log analysis tooling.
> 
> I presented already the allocation rate. The article is missing the reference 
> on how it is extracted. Updated. The metrics memUsed/memTotal_max is
> actually calculated from GC event notifications, which is the same information 
> you get in the GC logs.
> 
> In general, there are two different directions with the (micro-)benchmarking:
> 
> Evaluation: get a few scalar values about throughput, efficiency and resource 
> usage in a reproducible way. For example, for a compression codec you will 
> look at: data throughput at maximum CPU utilization, compression efficiency and 
> memory consumption
> 
> Tuning: Get insights and find possible tuning options. You will collect lots of
> performance counters, do profiling etc.
> 
> Tuning the GC, or, the "GC friendliness" of code is one tuning opportunity of 
> many. When looking for tuning opportunities I typically switch on profiling and 
> record CPU performance counters as well, and try to look at the whole thing.
> This means, when I am on the "tuning track", the performance data is not 
> representative since the profiling is biasing. Typically I also go with less 
> iterations, since I don't need the accuracy.
> 
> My focus in the article and at the moment is on the evaluation part. At the 
> end it boils down to how many bucks we need to spend on power, CPUs and 
> memory. That said, the allocation rate is already an "irrelevant" metric for 
> evaluation but important for deeper analysis.
> 
> Also, as it comes to GC related tuning and analysis, I have not so much 
> motivation, yet. cache2k, which I am the author of, has the lowest allocation 
> rate of all ;) 
> 
> However, your feedback is highly appreciated. I want to look deeper into GC 
> effects in the future.
> 
> Cheers,
> 
> Jens
> 
> -- 
> "Everything superfluous is wrong!"
> 
>   // Jens Wilke - headissue GmbH - Germany
> \//  https://headissue.com