Benchmarking sort algorithms on arrays

Thu Apr 23 11:57:13 UTC 2015

On 04/20/2015 09:22 PM, Behrooz Nobakht wrote:
> Q1. As described above, we use System.arrayCopy in the @Benchmark
> method itself. We are relying on the assumption that System.arrayCopy
> takes the same amount of time roughly(?!) and that's why the
> benchmark measurements are all off by the same constant amount. How
> reliable is this assumption in the context of using JMH for such
> purpose?

One can be somewhat hope System.arraycopy to be dependent on size and
type of the copied data. But, as always, you better to wind up the
baseline tests that *only* copy the arrays without actually sorting, to
make sure that fact holds true in every setup.

> Q2. Continuing with Q1, another approach is to use @Setup(Level.Invocation)
> and then prepare the data at this level. However, the comments on
> Level.Invocation somehow made us hesitate on using it. So, which way is the
> better one to go with?

I would say copying the data is better, given the Level.Invocation
caveats. arraycopy is O(N) with small constant because of vectorization
on most platforms. (Comparing) sorting is O(N log N) with "hopefully"
larger constant.

> Q3. There's a concern that System.arrayCopy for large data might
> actually overshadow the actually time for sorting. For example if
> total time of one method execution is T_0 = T_1 + T_2 in which T_1 is
> the time for System.arrayCopy and T_2 is for calling sort on the
> data, and then if we have roughly T_1 ~= T_2, then the measurement is
> influenced by roughly 50% because of data preparation. What's the
> best practice to address this?

That's another reason to have the baseline tests that do only the
preparatory work.

All-in-all, there is no ultimate answer to any of your questions.
Experimental setup should be verified on case-to-case basis, and
adjusted once you discover something is not fitting the setup well.
Contrary to what most software engineers believe, there is *NO* way to
be sure something works unless you actually observe it works.

> Q4. Although the Java documentation of JMH source code is very
> helpful to understand how to use it, would you please provide a more
> precise comparative or relative definitions of "benchmark" and
> "iteration"? How @Measurement and its parameters come into the
> picture? Is there a reference for the basic definitions?

Javadoc for Level has some basic definitions. Improvements are welcome.

Thanks,
-Aleksey.