Benchmarking sort algorithms on arrays

Mon Apr 20 18:22:51 UTC 2015

Hi,

Recently, we've started to develop a set of benchmarks to measure the
performance of TimSort in comparison with a few tweaks.

The general structure of every benchmark method is:

1. Prepare an array with different styles.
Styles include deterministic random numbers, all equal arrays, partially
already-sorted arrays and a few more.

2. Copy the raw data array into the benchmark array.
We use System.arrayCopy for this.

3. Sort the array using the original TimSort or the tweaked one.

The complete source code can be found at:
https://github.com/abstools/timsort-benchmark

However, we've been tackling with a couple of questions in this regards
and we'd appreciate it to have your feedback on the following:

Q1. As described above, we use System.arrayCopy in the @Benchmark method
itself. We are relying on the assumption that System.arrayCopy takes the
same
amount of time roughly(?!) and that's why the benchmark measurements are
all
off by the same constant amount. How reliable is this assumption in the
context of using JMH for such purpose?

Q2. Continuing with Q1, another approach is to use @Setup(Level.Invocation)
and then prepare the data at this level. However, the comments on
Level.Invocation somehow made us hesitate on using it. So, which way is the
better one to go with?

Q3. There's a concern that System.arrayCopy for large data might actually
overshadow the actually time for sorting. For example if total time of one
method execution is T_0 = T_1 + T_2 in which T_1 is the time for
System.arrayCopy and T_2 is for calling sort on the data, and then if we
have
roughly T_1 ~= T_2, then the measurement is influenced by roughly 50%
because
of data preparation. What's the best practice to address this?

Q4. Although the Java documentation of JMH source code is very helpful to
understand how to use it, would you please provide a more precise
comparative
or relative definitions of "benchmark" and "iteration"? How @Measurement
and
its parameters come into the picture? Is there a reference for the basic
definitions?

Thanks in advance for any comments,
Behrooz