[foreign] some JMH benchmarks
Samuel Audet
samuel.audet at gmail.com
Tue Sep 18 04:48:58 UTC 2018
Anyway, I've put online an updated version of my benchmark files here:
https://gist.github.com/saudet/1bf14a000e64c245675cf5d4e9ad6e69
Just run "git clone" on the URL and run "mvn package" on the pom.xml.
With the 2 virtual cores of an Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
running Ubuntu 14.04 on the cloud with GCC 4.9 and OpenJDK 8, I get
these numbers:
Benchmark Mode Cnt Score Error
Units
NativeBenchmark.expBenchmark thrpt 25 37460540.440 ± 393299.974
ops/s
NativeBenchmark.getpidBenchmark thrpt 25 100323188.451 ± 1254197.449
ops/s
While on my laptop, an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz running
Fedora 27, GCC 7.3, and OpenJDK 9, I get the following:
Benchmark Mode Cnt Score Error
Units
NativeBenchmark.expBenchmark thrpt 25 50047147.099 ± 924366.937
ops/s
NativeBenchmark.getpidBenchmark thrpt 25 4825508.193 ± 21662.633
ops/s
Now, it looks like getpid() is really slow on Fedora 27 for some reason,
but as Linus puts it, we should not be using that for benchmarking:
https://yarchive.net/comp/linux/getpid_caching.html
What do you get on your machines?
Samuel
On 09/18/2018 12:58 AM, Maurizio Cimadamore wrote:
> For the records, here's what I get for all the three benchmarks if I
> compile the JNI code with -O3:
>
> Benchmark Mode Cnt Score Error Units
> PanamaBenchmark.testJNIExp thrpt 5 28575269.294 ± 1907726.710 ops/s
> PanamaBenchmark.testJNIJavaQsort thrpt 5 372148.433 ± 27178.529 ops/s
> PanamaBenchmark.testJNIPid thrpt 5 59240069.011 ± 403881.697 ops/s
>
> The first and second benchmarks get faster and very close to the
> 'direct' optimization numbers in [1]. Surprisingly, the last benchmark
> (getpid) is quite slower. I've been able to reproduce across multiple
> runs; for that benchmark omitting O3 seems to be the achieve best
> results, not sure why. It starts of faster (around in the first couple
> of warmup iterations, but then it goes slower in all the other runs -
> presumably it interacts badly with the C2 generated code. For instance,
> this is a run with O3 enabled:
>
> # Run progress: 66.67% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration 1: 65182202.653 ops/s
> # Warmup Iteration 2: 64900639.094 ops/s
> # Warmup Iteration 3: 59314945.437 ops/s
> <---------------------------------
> # Warmup Iteration 4: 59269007.877 ops/s
> # Warmup Iteration 5: 59239905.163 ops/s
> Iteration 1: 59300748.074 ops/s
> Iteration 2: 59249666.044 ops/s
> Iteration 3: 59268597.051 ops/s
> Iteration 4: 59322074.572 ops/s
> Iteration 5: 59059259.317 ops/s
>
> And this is a run with O3 disabled:
>
> # Run progress: 0.00% complete, ETA 00:01:40
> # Fork: 1 of 1
> # Warmup Iteration 1: 55882128.787 ops/s
> # Warmup Iteration 2: 53102361.751 ops/s
> # Warmup Iteration 3: 66964755.699 ops/s
> <---------------------------------
> # Warmup Iteration 4: 66414428.355 ops/s
> # Warmup Iteration 5: 65328475.276 ops/s
> Iteration 1: 64229192.993 ops/s
> Iteration 2: 65191719.319 ops/s
> Iteration 3: 65352022.471 ops/s
> Iteration 4: 65152090.426 ops/s
> Iteration 5: 65320545.712 ops/s
>
>
> In both cases, the 3rd warmup execution sees a performance jump - with
> O3, the jump is backwards, w/o O3 the jump is forward, which is quite
> typical for a JMH benchmark as C2 optimization will start to kick in.
>
> For these reasons, I'm reluctant to update my benchmark numbers to
> reflect the O3 behavior (although I agree that, since the Hotspot code
> is compiled with that optimization it would make more sense to use that
> as a reference).
>
> Maurizio
>
> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
>
>
>
> On 17/09/18 16:18, Maurizio Cimadamore wrote:
>>
>>
>> On 17/09/18 15:08, Samuel Audet wrote:
>>> Yes, the blackhole or the random number doesn't make any difference,
>>> but not calling gcc with -O3 does. Running the compiler with
>>> optimizations on is pretty common, but they are not enabled by default.
>> A bit better
>>
>> PanamaBenchmark.testMethod thrpt 5 28018170.076 ± 8491668.248 ops/s
>>
>> But not much of a difference (I did not expected much, as the body of
>> the native method is extremely simple).
>>
>> Maurizio
More information about the panama-dev
mailing list