[foreign] some JMH benchmarks
Samuel Audet
samuel.audet at gmail.com
Fri Sep 21 00:51:13 UTC 2018
Sounds good, thanks for testing this and for filing the bug report!
Samuel
On 09/21/2018 03:14 AM, Maurizio Cimadamore wrote:
> Sorry for the delay in getting back at you. There's indeed something
> fishy going on here, and I have spotted a regression in JNI perf since
> JDK 11. This could be caused by update in compiler toolchain introduced
> in same version, but I have filed an issue for our hotspot team to
> investigate:
>
> https://bugs.openjdk.java.net/browse/JDK-8210975
>
> In the context of this discussion, it's likely that the rtegression is
> affecting the numbers of both Panama (which is built on top of JNI at
> the moment) and the JNI benchmarks.
>
> Thanks
> Maurizio
>
>
> On 19/09/18 01:13, Samuel Audet wrote:
>> Thanks! You haven't mentioned the version of the JDK you're using
>> though. I'm starting to get the impression that JNI in newer versions
>> of OpenJDK will be slower... ?
>>
>> On 09/18/2018 07:03 PM, Maurizio Cimadamore wrote:
>>> These are the numbers I get
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> NativeBenchmark.expBenchmark thrpt 5 30542590.094 ±
>>> 44126.434 ops/s
>>> NativeBenchmark.getpidBenchmark thrpt 5 61764677.092 ±
>>> 21102.236 ops/s
>>>
>>> They are in the same ballpark, but exp() is a bit faster; byw, I
>>> tried to repeat my benchmark with JNI exp() _and_ O3 and I've got
>>> very similar numbers (yesterday I did a very quick test and there was
>>> probably some other job running on the machine and brining down the
>>> figures a bit).
>>>
>>> But overall, the results in your bench seem to match what I got: exp
>>> is faster, pid is slower, the difference is mostly caused by O3. If
>>> no O3 is used, then the numbers should match what I included in my
>>> numbers (and getpid should be a bit faster).
>>>
>>> Maurizio
>>>
>>>
>>> On 18/09/18 05:48, Samuel Audet wrote:
>>>> Anyway, I've put online an updated version of my benchmark files here:
>>>> https://gist.github.com/saudet/1bf14a000e64c245675cf5d4e9ad6e69
>>>> Just run "git clone" on the URL and run "mvn package" on the pom.xml.
>>>>
>>>> With the 2 virtual cores of an Intel(R) Xeon(R) CPU E5-2673 v4 @
>>>> 2.30GHz running Ubuntu 14.04 on the cloud with GCC 4.9 and OpenJDK
>>>> 8, I get these numbers:
>>>>
>>>> Benchmark Mode Cnt Score Error Units
>>>> NativeBenchmark.expBenchmark thrpt 25 37460540.440 ±
>>>> 393299.974 ops/s
>>>> NativeBenchmark.getpidBenchmark thrpt 25 100323188.451 ±
>>>> 1254197.449 ops/s
>>>>
>>>> While on my laptop, an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
>>>> running Fedora 27, GCC 7.3, and OpenJDK 9, I get the following:
>>>>
>>>> Benchmark Mode Cnt Score Error Units
>>>> NativeBenchmark.expBenchmark thrpt 25 50047147.099 ±
>>>> 924366.937 ops/s
>>>> NativeBenchmark.getpidBenchmark thrpt 25 4825508.193 ±
>>>> 21662.633 ops/s
>>>>
>>>> Now, it looks like getpid() is really slow on Fedora 27 for some
>>>> reason, but as Linus puts it, we should not be using that for
>>>> benchmarking:
>>>> https://yarchive.net/comp/linux/getpid_caching.html
>>>>
>>>> What do you get on your machines?
>>>>
>>>> Samuel
>>>>
>>>>
>>>> On 09/18/2018 12:58 AM, Maurizio Cimadamore wrote:
>>>>> For the records, here's what I get for all the three benchmarks if
>>>>> I compile the JNI code with -O3:
>>>>>
>>>>> Benchmark Mode Cnt Score Error Units
>>>>> PanamaBenchmark.testJNIExp thrpt 5 28575269.294 ±
>>>>> 1907726.710 ops/s
>>>>> PanamaBenchmark.testJNIJavaQsort thrpt 5 372148.433 ±
>>>>> 27178.529 ops/s
>>>>> PanamaBenchmark.testJNIPid thrpt 5 59240069.011 ±
>>>>> 403881.697 ops/s
>>>>>
>>>>> The first and second benchmarks get faster and very close to the
>>>>> 'direct' optimization numbers in [1]. Surprisingly, the last
>>>>> benchmark (getpid) is quite slower. I've been able to reproduce
>>>>> across multiple runs; for that benchmark omitting O3 seems to be
>>>>> the achieve best results, not sure why. It starts of faster (around
>>>>> in the first couple of warmup iterations, but then it goes slower
>>>>> in all the other runs - presumably it interacts badly with the C2
>>>>> generated code. For instance, this is a run with O3 enabled:
>>>>>
>>>>> # Run progress: 66.67% complete, ETA 00:01:40
>>>>> # Fork: 1 of 1
>>>>> # Warmup Iteration 1: 65182202.653 ops/s
>>>>> # Warmup Iteration 2: 64900639.094 ops/s
>>>>> # Warmup Iteration 3: 59314945.437 ops/s
>>>>> <---------------------------------
>>>>> # Warmup Iteration 4: 59269007.877 ops/s
>>>>> # Warmup Iteration 5: 59239905.163 ops/s
>>>>> Iteration 1: 59300748.074 ops/s
>>>>> Iteration 2: 59249666.044 ops/s
>>>>> Iteration 3: 59268597.051 ops/s
>>>>> Iteration 4: 59322074.572 ops/s
>>>>> Iteration 5: 59059259.317 ops/s
>>>>>
>>>>> And this is a run with O3 disabled:
>>>>>
>>>>> # Run progress: 0.00% complete, ETA 00:01:40
>>>>> # Fork: 1 of 1
>>>>> # Warmup Iteration 1: 55882128.787 ops/s
>>>>> # Warmup Iteration 2: 53102361.751 ops/s
>>>>> # Warmup Iteration 3: 66964755.699 ops/s
>>>>> <---------------------------------
>>>>> # Warmup Iteration 4: 66414428.355 ops/s
>>>>> # Warmup Iteration 5: 65328475.276 ops/s
>>>>> Iteration 1: 64229192.993 ops/s
>>>>> Iteration 2: 65191719.319 ops/s
>>>>> Iteration 3: 65352022.471 ops/s
>>>>> Iteration 4: 65152090.426 ops/s
>>>>> Iteration 5: 65320545.712 ops/s
>>>>>
>>>>>
>>>>> In both cases, the 3rd warmup execution sees a performance jump -
>>>>> with O3, the jump is backwards, w/o O3 the jump is forward, which
>>>>> is quite typical for a JMH benchmark as C2 optimization will start
>>>>> to kick in.
>>>>>
>>>>> For these reasons, I'm reluctant to update my benchmark numbers to
>>>>> reflect the O3 behavior (although I agree that, since the Hotspot
>>>>> code is compiled with that optimization it would make more sense to
>>>>> use that as a reference).
>>>>>
>>>>> Maurizio
>>>>>
>>>>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
>>>>>
>>>>>
>>>>>
>>>>> On 17/09/18 16:18, Maurizio Cimadamore wrote:
>>>>>>
>>>>>>
>>>>>> On 17/09/18 15:08, Samuel Audet wrote:
>>>>>>> Yes, the blackhole or the random number doesn't make any
>>>>>>> difference, but not calling gcc with -O3 does. Running the
>>>>>>> compiler with optimizations on is pretty common, but they are not
>>>>>>> enabled by default.
>>>>>> A bit better
>>>>>>
>>>>>> PanamaBenchmark.testMethod thrpt 5 28018170.076 ± 8491668.248
>>>>>> ops/s
>>>>>>
>>>>>> But not much of a difference (I did not expected much, as the body
>>>>>> of the native method is extremely simple).
>>>>>>
>>>>>> Maurizio
>>>>
>>>
>>
>
More information about the panama-dev
mailing list