Proposal for two new samples

Sergey Kuksenko sergey.kuksenko at oracle.com
Tue Aug 11 09:06:30 UTC 2015


May be it is wrong mailing list to discuss that, but I just want to add 
my two cents.

1. Branch prediction.
You are using Arrays.sort for that. I think it's makes sense to add a 
note that if Arrays.sort implementation is changed (that happens from 
time to time) you could get different results.
or may be it worth to implement sorting algorithm in the benchmark.

2. If you need matrix cache mises example I could suggest this

http://www.slideshare.net/SergeyKuksenko/java-performance-speedup-your-application-with-hardware-counters

The first example.

On 07/28/2015 12:22 AM, Michael Mirwaldt wrote:
> Hi,
> may I introduce me: I am Michael Mirwaldt from Germany,
> studied Computer Science with a diploma (which is about a master's degree).
> I have programmed in Java for nearly ten years now.
>
> I would like to add two samples I miss in the jmh repository.
> They could help jmh users to experience the effect and demonstrate that
> on their lectures.
>
> 1) Branch prediction
> - it demonstrates how branch prediction/misprediction can lead to
> better/worse performance.
> - it "loops" through a sorted and an unsorted array and "consumes" only
> high values
> - I got the idea from
> http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array
> where that phenomena was discussed
> - I could not check whether the branch misses ratio increases.
> If you are interested I will do so and should observe that with the
> command line tool perf on a linux machine.
> I should observe it with the command line tool perf on a linux machine.
> - my sample gives reliable results
> e.g.
> Benchmark                                                      Mode
> Cnt   Score   Error  Units
> JMHSample_36_BranchPrediction.benchmark_sortedArray avgt   25  12,741 ±
> 0,151  ns/op
> JMHSample_36_BranchPrediction.benchmark_sortedArray:counter avgt   25
> 12,602 ± 0,143  ns/op
> JMHSample_36_BranchPrediction.benchmark_unsortedArray avgt   25  19,710
> ± 0,986  ns/op
> JMHSample_36_BranchPrediction.benchmark_unsortedArray:counter avgt   25
> 19,524 ± 0,935  ns/op
>
> 2) Matrix copy
> - it demonstrates that it matters how you iterate through a two
> dimensional array (when you copy a matrix)
> - iterating "column by column" is often faster than "row by row" because
> it leads to less cache faults.
> - I was inspired by a lecture on a conference where somebody mentioned that
> - I could not observe whether the the cache hits ratio sinks when
> running the sample on my windows machine yet.
> If you are interested I will do so and should observe that with the
> command line tool perf on a linux machine.
> - my sample gives reliable results
> e.g.
> Benchmark                                                    Mode Cnt
> Score   Error  Units
> JMHSample_37_MatrixCopy.benchmark_transposeColByCol          avgt 25
> 41,073 ± 2,023  ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeColByCol:counter  avgt 25
> 40,344 ± 2,089  ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeRowByRow          avgt 25
> 28,393 ± 0,571  ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeRowByRow:counter  avgt 25
> 28,148 ± 0,600  ns/op
>
> What do you think of these two simple samples?
> Are the results 'significant in numbers' for you?
> How can I submit my sample code so that you can try it out/review them?
>
> I would really apppreciate your feedback.
>
> Brgds,
> Michael
>
>

-- 
Best regards,
Sergey Kuksenko


More information about the jmh-dev mailing list