[foreign] Poor performance?

Fri May 17 15:54:40 UTC 2019

Ok - that is what I was expecting, thanks!

I think now, the problem is limited to the fact that Scope::allocate is 
slow - but note that similar problems have also been found elsewhere:

https://github.com/bytedeco/javacpp/issues/299

My point here is that, while we should remove accidental performance 
degradation (the fact that we keep reparsing the layout anno seems to be 
an offender here, which the benchmark writer had to work around), on the 
other hand it is not, I think, 100% fair to compare higher-level 
allocation solution to a raw malloc.

For instance, the benchmark is reusing the same scope over and over, 
which means you keep allocating in the same scope, and create bigger and 
bigger lists of allocation units (which at some point will have to be 
resized), as a result that will use more heap memory than the JNI 
counterpart. As for native memory usage I don't know - in the JNI bench 
I don't see a 'free' but maybe JavaCPP is cleaning that up automagically 
(with a Cleaner?).

Those are important behavioral differences which should be taken into 
account when looking at the numbers.

Maurizio

On 17/05/2019 16:27, Jorn Vernee wrote:
> Sorry, forgot to include the CallOnly results (seems to have been 
> omitted for some reason), which look much better:
>
> Benchmark                                   Mode  Cnt Score     Error  
> Units
> JmhCallOnly.jni_javacpp                     avgt   50    64.958 ▒   
> 3.608  ns/op
> JmhCallOnly.panama                          avgt   50    39.231 ▒   
> 1.951  ns/op
> JmhGetSystemTimeSeconds.jni_javacpp         avgt   50   295.754 ▒ 
> 13.541  ns/op
> JmhGetSystemTimeSeconds.panama_prelayout    avgt   50   610.027 ▒ 
> 30.592  ns/op
>
> Obviously, this deserves some more investigation either way :)
>
> Jorn
>
> Jorn Vernee schreef op 2019-05-17 17:14:
>> FWIW, I ran the benchmarks with the linkToNative back-end (using
>> -Djdk.internal.foreign.NativeInvoker.FASTPATH=direct), but it's still
>> 2x slower than JNI:
>>
>> Benchmark                                   Mode  Cnt Score     
>> Error  Units
>> JmhGetSystemTimeSeconds.jni_javacpp         avgt   50   298.046 ▒  
>> 15.744  ns/op
>> JmhGetSystemTimeSeconds.panama_prelayout    avgt   50   596.567 ▒  
>> 20.570  ns/op
>>
>> Of course, like Aleksey says: "The numbers [above] are just data. To
>> gain reusable insights, you need to follow up on why the numbers are
>> the way they are.". Unfortunately, I'm having some trouble getting the
>> project to work with the Windows profiler :/ Was currently looking
>> into that.
>>
>> Cheers,
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-05-17 16:51:
>>> On 17/05/2019 11:26, Maurizio Cimadamore wrote:
>>>> thanks you for bringing this up, I saw this benchmark few days ago 
>>>> and I took a look at it. That benchmark is unfortunately hitting on 
>>>> a couple of (transitory!) pain points: (1) it is running on 
>>>> Windows, which lacks the optimizations available for MacOS and 
>>>> Linux (directInvoker). When the linkToNative effort will be 
>>>> completed, this discrepancy between platforms will go away. The 
>>>> second problem (2) is that the call is passing a big struct (e.g. 
>>>> bigger than 64 bits). Even on Linux and Mac, such a call would be 
>>>> unable to take advantage of the optimized invoker and would fall 
>>>> back to the so called 'universal invoker' which is slow.
>>>
>>> Actually, my bad, the bench is passing pointer to structs, not structs
>>> by value - which I think should mean the 'foreign+linkToNative'
>>> experimental branch should be able to handle this. Would be nice to
>>> get some confirmation that this is indeed the case.
>>>
>>> Maurizio