[foreign] some JMH benchmarks
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Sep 20 18:14:25 UTC 2018
Sorry for the delay in getting back at you. There's indeed something
fishy going on here, and I have spotted a regression in JNI perf since
JDK 11. This could be caused by update in compiler toolchain introduced
in same version, but I have filed an issue for our hotspot team to
investigate:
https://bugs.openjdk.java.net/browse/JDK-8210975
In the context of this discussion, it's likely that the rtegression is
affecting the numbers of both Panama (which is built on top of JNI at
the moment) and the JNI benchmarks.
Thanks
Maurizio
On 19/09/18 01:13, Samuel Audet wrote:
> Thanks! You haven't mentioned the version of the JDK you're using
> though. I'm starting to get the impression that JNI in newer versions
> of OpenJDK will be slower... ?
>
> On 09/18/2018 07:03 PM, Maurizio Cimadamore wrote:
>> These are the numbers I get
>>
>> Benchmark Mode Cnt Score Error Units
>> NativeBenchmark.expBenchmark thrpt 5 30542590.094 ±
>> 44126.434 ops/s
>> NativeBenchmark.getpidBenchmark thrpt 5 61764677.092 ±
>> 21102.236 ops/s
>>
>> They are in the same ballpark, but exp() is a bit faster; byw, I
>> tried to repeat my benchmark with JNI exp() _and_ O3 and I've got
>> very similar numbers (yesterday I did a very quick test and there was
>> probably some other job running on the machine and brining down the
>> figures a bit).
>>
>> But overall, the results in your bench seem to match what I got: exp
>> is faster, pid is slower, the difference is mostly caused by O3. If
>> no O3 is used, then the numbers should match what I included in my
>> numbers (and getpid should be a bit faster).
>>
>> Maurizio
>>
>>
>> On 18/09/18 05:48, Samuel Audet wrote:
>>> Anyway, I've put online an updated version of my benchmark files here:
>>> https://gist.github.com/saudet/1bf14a000e64c245675cf5d4e9ad6e69
>>> Just run "git clone" on the URL and run "mvn package" on the pom.xml.
>>>
>>> With the 2 virtual cores of an Intel(R) Xeon(R) CPU E5-2673 v4 @
>>> 2.30GHz running Ubuntu 14.04 on the cloud with GCC 4.9 and OpenJDK
>>> 8, I get these numbers:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> NativeBenchmark.expBenchmark thrpt 25 37460540.440 ±
>>> 393299.974 ops/s
>>> NativeBenchmark.getpidBenchmark thrpt 25 100323188.451 ±
>>> 1254197.449 ops/s
>>>
>>> While on my laptop, an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
>>> running Fedora 27, GCC 7.3, and OpenJDK 9, I get the following:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> NativeBenchmark.expBenchmark thrpt 25 50047147.099 ±
>>> 924366.937 ops/s
>>> NativeBenchmark.getpidBenchmark thrpt 25 4825508.193 ±
>>> 21662.633 ops/s
>>>
>>> Now, it looks like getpid() is really slow on Fedora 27 for some
>>> reason, but as Linus puts it, we should not be using that for
>>> benchmarking:
>>> https://yarchive.net/comp/linux/getpid_caching.html
>>>
>>> What do you get on your machines?
>>>
>>> Samuel
>>>
>>>
>>> On 09/18/2018 12:58 AM, Maurizio Cimadamore wrote:
>>>> For the records, here's what I get for all the three benchmarks if
>>>> I compile the JNI code with -O3:
>>>>
>>>> Benchmark Mode Cnt Score Error Units
>>>> PanamaBenchmark.testJNIExp thrpt 5 28575269.294 ±
>>>> 1907726.710 ops/s
>>>> PanamaBenchmark.testJNIJavaQsort thrpt 5 372148.433 ±
>>>> 27178.529 ops/s
>>>> PanamaBenchmark.testJNIPid thrpt 5 59240069.011 ±
>>>> 403881.697 ops/s
>>>>
>>>> The first and second benchmarks get faster and very close to the
>>>> 'direct' optimization numbers in [1]. Surprisingly, the last
>>>> benchmark (getpid) is quite slower. I've been able to reproduce
>>>> across multiple runs; for that benchmark omitting O3 seems to be
>>>> the achieve best results, not sure why. It starts of faster (around
>>>> in the first couple of warmup iterations, but then it goes slower
>>>> in all the other runs - presumably it interacts badly with the C2
>>>> generated code. For instance, this is a run with O3 enabled:
>>>>
>>>> # Run progress: 66.67% complete, ETA 00:01:40
>>>> # Fork: 1 of 1
>>>> # Warmup Iteration 1: 65182202.653 ops/s
>>>> # Warmup Iteration 2: 64900639.094 ops/s
>>>> # Warmup Iteration 3: 59314945.437 ops/s
>>>> <---------------------------------
>>>> # Warmup Iteration 4: 59269007.877 ops/s
>>>> # Warmup Iteration 5: 59239905.163 ops/s
>>>> Iteration 1: 59300748.074 ops/s
>>>> Iteration 2: 59249666.044 ops/s
>>>> Iteration 3: 59268597.051 ops/s
>>>> Iteration 4: 59322074.572 ops/s
>>>> Iteration 5: 59059259.317 ops/s
>>>>
>>>> And this is a run with O3 disabled:
>>>>
>>>> # Run progress: 0.00% complete, ETA 00:01:40
>>>> # Fork: 1 of 1
>>>> # Warmup Iteration 1: 55882128.787 ops/s
>>>> # Warmup Iteration 2: 53102361.751 ops/s
>>>> # Warmup Iteration 3: 66964755.699 ops/s
>>>> <---------------------------------
>>>> # Warmup Iteration 4: 66414428.355 ops/s
>>>> # Warmup Iteration 5: 65328475.276 ops/s
>>>> Iteration 1: 64229192.993 ops/s
>>>> Iteration 2: 65191719.319 ops/s
>>>> Iteration 3: 65352022.471 ops/s
>>>> Iteration 4: 65152090.426 ops/s
>>>> Iteration 5: 65320545.712 ops/s
>>>>
>>>>
>>>> In both cases, the 3rd warmup execution sees a performance jump -
>>>> with O3, the jump is backwards, w/o O3 the jump is forward, which
>>>> is quite typical for a JMH benchmark as C2 optimization will start
>>>> to kick in.
>>>>
>>>> For these reasons, I'm reluctant to update my benchmark numbers to
>>>> reflect the O3 behavior (although I agree that, since the Hotspot
>>>> code is compiled with that optimization it would make more sense to
>>>> use that as a reference).
>>>>
>>>> Maurizio
>>>>
>>>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
>>>>
>>>>
>>>>
>>>> On 17/09/18 16:18, Maurizio Cimadamore wrote:
>>>>>
>>>>>
>>>>> On 17/09/18 15:08, Samuel Audet wrote:
>>>>>> Yes, the blackhole or the random number doesn't make any
>>>>>> difference, but not calling gcc with -O3 does. Running the
>>>>>> compiler with optimizations on is pretty common, but they are not
>>>>>> enabled by default.
>>>>> A bit better
>>>>>
>>>>> PanamaBenchmark.testMethod thrpt 5 28018170.076 ± 8491668.248
>>>>> ops/s
>>>>>
>>>>> But not much of a difference (I did not expected much, as the body
>>>>> of the native method is extremely simple).
>>>>>
>>>>> Maurizio
>>>
>>
>
More information about the panama-dev
mailing list