[foreign] Poor performance?

Fri May 17 15:27:55 UTC 2019

Sorry, forgot to include the CallOnly results (seems to have been 
omitted for some reason), which look much better:

Benchmark                                   Mode  Cnt     Score     
Error  Units
JmhCallOnly.jni_javacpp                     avgt   50    64.958 ▒   
3.608  ns/op
JmhCallOnly.panama                          avgt   50    39.231 ▒   
1.951  ns/op
JmhGetSystemTimeSeconds.jni_javacpp         avgt   50   295.754 ▒  
13.541  ns/op
JmhGetSystemTimeSeconds.panama_prelayout    avgt   50   610.027 ▒  
30.592  ns/op

Obviously, this deserves some more investigation either way :)

Jorn

Jorn Vernee schreef op 2019-05-17 17:14:
> FWIW, I ran the benchmarks with the linkToNative back-end (using
> -Djdk.internal.foreign.NativeInvoker.FASTPATH=direct), but it's still
> 2x slower than JNI:
> 
> Benchmark                                   Mode  Cnt     Score     
> Error  Units
> JmhGetSystemTimeSeconds.jni_javacpp         avgt   50   298.046 ▒  
> 15.744  ns/op
> JmhGetSystemTimeSeconds.panama_prelayout    avgt   50   596.567 ▒  
> 20.570  ns/op
> 
> Of course, like Aleksey says: "The numbers [above] are just data. To
> gain reusable insights, you need to follow up on why the numbers are
> the way they are.". Unfortunately, I'm having some trouble getting the
> project to work with the Windows profiler :/ Was currently looking
> into that.
> 
> Cheers,
> Jorn
> 
> Maurizio Cimadamore schreef op 2019-05-17 16:51:
>> On 17/05/2019 11:26, Maurizio Cimadamore wrote:
>>> thanks you for bringing this up, I saw this benchmark few days ago 
>>> and I took a look at it. That benchmark is unfortunately hitting on a 
>>> couple of (transitory!) pain points: (1) it is running on Windows, 
>>> which lacks the optimizations available for MacOS and Linux 
>>> (directInvoker). When the linkToNative effort will be completed, this 
>>> discrepancy between platforms will go away. The second problem (2) is 
>>> that the call is passing a big struct (e.g. bigger than 64 bits). 
>>> Even on Linux and Mac, such a call would be unable to take advantage 
>>> of the optimized invoker and would fall back to the so called 
>>> 'universal invoker' which is slow.
>> 
>> Actually, my bad, the bench is passing pointer to structs, not structs
>> by value - which I think should mean the 'foreign+linkToNative'
>> experimental branch should be able to handle this. Would be nice to
>> get some confirmation that this is indeed the case.
>> 
>> Maurizio