[foreign] some JMH benchmarks

Wed Sep 19 00:13:27 UTC 2018

Thanks! You haven't mentioned the version of the JDK you're using 
though. I'm starting to get the impression that JNI in newer versions of 
OpenJDK will be slower... ?

On 09/18/2018 07:03 PM, Maurizio Cimadamore wrote:
> These are the numbers I get
> 
> Benchmark                         Mode  Cnt         Score Error  Units
> NativeBenchmark.expBenchmark     thrpt    5  30542590.094 ± 44126.434  
> ops/s
> NativeBenchmark.getpidBenchmark  thrpt    5  61764677.092 ± 21102.236  
> ops/s
> 
> They are in the same ballpark, but exp() is a bit faster; byw, I tried 
> to repeat my benchmark with JNI exp() _and_ O3 and I've got very similar 
> numbers (yesterday I did a very quick test and there was probably some 
> other job running on the machine and brining down the figures a bit).
> 
> But overall, the results in your bench seem to match what I got: exp is 
> faster, pid is slower, the difference is mostly caused by O3. If no O3 
> is used, then the numbers should match what I included in my numbers 
> (and getpid should be a bit faster).
> 
> Maurizio
> 
> 
> On 18/09/18 05:48, Samuel Audet wrote:
>> Anyway, I've put online an updated version of my benchmark files here:
>> https://gist.github.com/saudet/1bf14a000e64c245675cf5d4e9ad6e69
>> Just run "git clone" on the URL and run "mvn package" on the pom.xml.
>>
>> With the 2 virtual cores of an Intel(R) Xeon(R) CPU E5-2673 v4 @ 
>> 2.30GHz running Ubuntu 14.04 on the cloud with GCC 4.9 and OpenJDK 8, 
>> I get these numbers:
>>
>> Benchmark                         Mode  Cnt          Score Error  Units
>> NativeBenchmark.expBenchmark     thrpt   25   37460540.440 ± 
>> 393299.974  ops/s
>> NativeBenchmark.getpidBenchmark  thrpt   25  100323188.451 ± 
>> 1254197.449  ops/s
>>
>> While on my laptop, an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz 
>> running Fedora 27, GCC 7.3, and OpenJDK 9, I get the following:
>>
>> Benchmark                         Mode  Cnt         Score Error Units
>> NativeBenchmark.expBenchmark     thrpt   25  50047147.099 ± 924366.937 
>> ops/s
>> NativeBenchmark.getpidBenchmark  thrpt   25   4825508.193 ± 21662.633 
>> ops/s
>>
>> Now, it looks like getpid() is really slow on Fedora 27 for some 
>> reason, but as Linus puts it, we should not be using that for 
>> benchmarking:
>> https://yarchive.net/comp/linux/getpid_caching.html
>>
>> What do you get on your machines?
>>
>> Samuel
>>
>>
>> On 09/18/2018 12:58 AM, Maurizio Cimadamore wrote:
>>> For the records, here's what I get for all the three benchmarks if I 
>>> compile the JNI code with -O3:
>>>
>>> Benchmark                          Mode  Cnt Score Error  Units
>>> PanamaBenchmark.testJNIExp        thrpt    5  28575269.294 ± 
>>> 1907726.710  ops/s
>>> PanamaBenchmark.testJNIJavaQsort  thrpt    5    372148.433 ± 
>>> 27178.529  ops/s
>>> PanamaBenchmark.testJNIPid        thrpt    5  59240069.011 ± 
>>> 403881.697  ops/s
>>>
>>> The first and second benchmarks get faster and very close to the 
>>> 'direct' optimization numbers in [1]. Surprisingly, the last 
>>> benchmark (getpid) is quite slower. I've been able to reproduce 
>>> across multiple runs; for that benchmark omitting O3 seems to be the 
>>> achieve best results, not sure why. It starts of faster (around in 
>>> the first couple of warmup iterations, but then it goes slower in all 
>>> the other runs - presumably it interacts badly with the C2 generated 
>>> code. For instance, this is a run with O3 enabled:
>>>
>>> # Run progress: 66.67% complete, ETA 00:01:40
>>> # Fork: 1 of 1
>>> # Warmup Iteration   1: 65182202.653 ops/s
>>> # Warmup Iteration   2: 64900639.094 ops/s
>>> # Warmup Iteration   3: 59314945.437 ops/s 
>>> <---------------------------------
>>> # Warmup Iteration   4: 59269007.877 ops/s
>>> # Warmup Iteration   5: 59239905.163 ops/s
>>> Iteration   1: 59300748.074 ops/s
>>> Iteration   2: 59249666.044 ops/s
>>> Iteration   3: 59268597.051 ops/s
>>> Iteration   4: 59322074.572 ops/s
>>> Iteration   5: 59059259.317 ops/s
>>>
>>> And this is a run with O3 disabled:
>>>
>>> # Run progress: 0.00% complete, ETA 00:01:40
>>> # Fork: 1 of 1
>>> # Warmup Iteration   1: 55882128.787 ops/s
>>> # Warmup Iteration   2: 53102361.751 ops/s
>>> # Warmup Iteration   3: 66964755.699 ops/s 
>>> <---------------------------------
>>> # Warmup Iteration   4: 66414428.355 ops/s
>>> # Warmup Iteration   5: 65328475.276 ops/s
>>> Iteration   1: 64229192.993 ops/s
>>> Iteration   2: 65191719.319 ops/s
>>> Iteration   3: 65352022.471 ops/s
>>> Iteration   4: 65152090.426 ops/s
>>> Iteration   5: 65320545.712 ops/s
>>>
>>>
>>> In both cases, the 3rd warmup execution sees a performance jump - 
>>> with O3, the jump is backwards, w/o O3 the jump is forward, which is 
>>> quite typical for a JMH benchmark as C2 optimization will start to 
>>> kick in.
>>>
>>> For these reasons, I'm reluctant to update my benchmark numbers to 
>>> reflect the O3 behavior (although I agree that, since the Hotspot 
>>> code is compiled with that optimization it would make more sense to 
>>> use that as a reference).
>>>
>>> Maurizio
>>>
>>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/foreign-jmh.txt
>>>
>>>
>>>
>>> On 17/09/18 16:18, Maurizio Cimadamore wrote:
>>>>
>>>>
>>>> On 17/09/18 15:08, Samuel Audet wrote:
>>>>> Yes, the blackhole or the random number doesn't make any 
>>>>> difference, but not calling gcc with -O3 does. Running the compiler 
>>>>> with optimizations on is pretty common, but they are not enabled by 
>>>>> default.
>>>> A bit better
>>>>
>>>> PanamaBenchmark.testMethod  thrpt    5  28018170.076 ± 8491668.248 
>>>> ops/s
>>>>
>>>> But not much of a difference (I did not expected much, as the body 
>>>> of the native method is extremely simple).
>>>>
>>>> Maurizio 
>>
>