[foreign] Poor performance?

Fri May 17 15:33:39 UTC 2019

Thanks Jorn,
I'd be more interested in knowing the raw native call numbers, does it 
get any better with linkToNative? Here I'd be expecting performances 
identical to JNI (since the binder should lower the Pointer to a long, 
which LinkToNative would then pass by register).

As for the fuller benchmark, note that you are also measuring the 
performances of Scope::allocate, which is internally using some maps. 
JNR/JNI does not do the same liveliness checks that we do, so the full 
benchmark is not totally fair. But the arw performance of the downcall 
should be an apple-to-apple comparison, and it shouldn't be 8x slower as 
it is now (at least not with linkToNative).

Maurizio

On 17/05/2019 16:14, Jorn Vernee wrote:

> FWIW, I ran the benchmarks with the linkToNative back-end (using 
> -Djdk.internal.foreign.NativeInvoker.FASTPATH=direct), but it's still 
> 2x slower than JNI:
>
> Benchmark                                   Mode  Cnt Score     Error  
> Units
> JmhGetSystemTimeSeconds.jni_javacpp         avgt   50   298.046 ▒ 
> 15.744  ns/op
> JmhGetSystemTimeSeconds.panama_prelayout    avgt   50   596.567 ▒ 
> 20.570  ns/op
>
> Of course, like Aleksey says: "The numbers [above] are just data. To 
> gain reusable insights, you need to follow up on why the numbers are 
> the way they are.". Unfortunately, I'm having some trouble getting the 
> project to work with the Windows profiler :/ Was currently looking 
> into that.
>
> Cheers,
> Jorn
>
> Maurizio Cimadamore schreef op 2019-05-17 16:51:
>> On 17/05/2019 11:26, Maurizio Cimadamore wrote:
>>> thanks you for bringing this up, I saw this benchmark few days ago 
>>> and I took a look at it. That benchmark is unfortunately hitting on 
>>> a couple of (transitory!) pain points: (1) it is running on Windows, 
>>> which lacks the optimizations available for MacOS and Linux 
>>> (directInvoker). When the linkToNative effort will be completed, 
>>> this discrepancy between platforms will go away. The second problem 
>>> (2) is that the call is passing a big struct (e.g. bigger than 64 
>>> bits). Even on Linux and Mac, such a call would be unable to take 
>>> advantage of the optimized invoker and would fall back to the so 
>>> called 'universal invoker' which is slow.
>>
>> Actually, my bad, the bench is passing pointer to structs, not structs
>> by value - which I think should mean the 'foreign+linkToNative'
>> experimental branch should be able to handle this. Would be nice to
>> get some confirmation that this is indeed the case.
>>
>> Maurizio