Proposal for two new samples
Sergey Kuksenko
sergey.kuksenko at oracle.com
Tue Aug 11 09:06:30 UTC 2015
May be it is wrong mailing list to discuss that, but I just want to add
my two cents.
1. Branch prediction.
You are using Arrays.sort for that. I think it's makes sense to add a
note that if Arrays.sort implementation is changed (that happens from
time to time) you could get different results.
or may be it worth to implement sorting algorithm in the benchmark.
2. If you need matrix cache mises example I could suggest this
http://www.slideshare.net/SergeyKuksenko/java-performance-speedup-your-application-with-hardware-counters
The first example.
On 07/28/2015 12:22 AM, Michael Mirwaldt wrote:
> Hi,
> may I introduce me: I am Michael Mirwaldt from Germany,
> studied Computer Science with a diploma (which is about a master's degree).
> I have programmed in Java for nearly ten years now.
>
> I would like to add two samples I miss in the jmh repository.
> They could help jmh users to experience the effect and demonstrate that
> on their lectures.
>
> 1) Branch prediction
> - it demonstrates how branch prediction/misprediction can lead to
> better/worse performance.
> - it "loops" through a sorted and an unsorted array and "consumes" only
> high values
> - I got the idea from
> http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array
> where that phenomena was discussed
> - I could not check whether the branch misses ratio increases.
> If you are interested I will do so and should observe that with the
> command line tool perf on a linux machine.
> I should observe it with the command line tool perf on a linux machine.
> - my sample gives reliable results
> e.g.
> Benchmark Mode
> Cnt Score Error Units
> JMHSample_36_BranchPrediction.benchmark_sortedArray avgt 25 12,741 ±
> 0,151 ns/op
> JMHSample_36_BranchPrediction.benchmark_sortedArray:counter avgt 25
> 12,602 ± 0,143 ns/op
> JMHSample_36_BranchPrediction.benchmark_unsortedArray avgt 25 19,710
> ± 0,986 ns/op
> JMHSample_36_BranchPrediction.benchmark_unsortedArray:counter avgt 25
> 19,524 ± 0,935 ns/op
>
> 2) Matrix copy
> - it demonstrates that it matters how you iterate through a two
> dimensional array (when you copy a matrix)
> - iterating "column by column" is often faster than "row by row" because
> it leads to less cache faults.
> - I was inspired by a lecture on a conference where somebody mentioned that
> - I could not observe whether the the cache hits ratio sinks when
> running the sample on my windows machine yet.
> If you are interested I will do so and should observe that with the
> command line tool perf on a linux machine.
> - my sample gives reliable results
> e.g.
> Benchmark Mode Cnt
> Score Error Units
> JMHSample_37_MatrixCopy.benchmark_transposeColByCol avgt 25
> 41,073 ± 2,023 ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeColByCol:counter avgt 25
> 40,344 ± 2,089 ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeRowByRow avgt 25
> 28,393 ± 0,571 ns/op
> JMHSample_37_MatrixCopy.benchmark_transposeRowByRow:counter avgt 25
> 28,148 ± 0,600 ns/op
>
> What do you think of these two simple samples?
> Are the results 'significant in numbers' for you?
> How can I submit my sample code so that you can try it out/review them?
>
> I would really apppreciate your feedback.
>
> Brgds,
> Michael
>
>
--
Best regards,
Sergey Kuksenko
More information about the jmh-dev
mailing list