[foreign] Poor performance?
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri May 17 15:54:40 UTC 2019
Ok - that is what I was expecting, thanks!
I think now, the problem is limited to the fact that Scope::allocate is
slow - but note that similar problems have also been found elsewhere:
https://github.com/bytedeco/javacpp/issues/299
My point here is that, while we should remove accidental performance
degradation (the fact that we keep reparsing the layout anno seems to be
an offender here, which the benchmark writer had to work around), on the
other hand it is not, I think, 100% fair to compare higher-level
allocation solution to a raw malloc.
For instance, the benchmark is reusing the same scope over and over,
which means you keep allocating in the same scope, and create bigger and
bigger lists of allocation units (which at some point will have to be
resized), as a result that will use more heap memory than the JNI
counterpart. As for native memory usage I don't know - in the JNI bench
I don't see a 'free' but maybe JavaCPP is cleaning that up automagically
(with a Cleaner?).
Those are important behavioral differences which should be taken into
account when looking at the numbers.
Maurizio
On 17/05/2019 16:27, Jorn Vernee wrote:
> Sorry, forgot to include the CallOnly results (seems to have been
> omitted for some reason), which look much better:
>
> Benchmark Mode Cnt Score Error
> Units
> JmhCallOnly.jni_javacpp avgt 50 64.958 ▒
> 3.608 ns/op
> JmhCallOnly.panama avgt 50 39.231 ▒
> 1.951 ns/op
> JmhGetSystemTimeSeconds.jni_javacpp avgt 50 295.754 ▒
> 13.541 ns/op
> JmhGetSystemTimeSeconds.panama_prelayout avgt 50 610.027 ▒
> 30.592 ns/op
>
> Obviously, this deserves some more investigation either way :)
>
> Jorn
>
> Jorn Vernee schreef op 2019-05-17 17:14:
>> FWIW, I ran the benchmarks with the linkToNative back-end (using
>> -Djdk.internal.foreign.NativeInvoker.FASTPATH=direct), but it's still
>> 2x slower than JNI:
>>
>> Benchmark Mode Cnt Score
>> Error Units
>> JmhGetSystemTimeSeconds.jni_javacpp avgt 50 298.046 ▒
>> 15.744 ns/op
>> JmhGetSystemTimeSeconds.panama_prelayout avgt 50 596.567 ▒
>> 20.570 ns/op
>>
>> Of course, like Aleksey says: "The numbers [above] are just data. To
>> gain reusable insights, you need to follow up on why the numbers are
>> the way they are.". Unfortunately, I'm having some trouble getting the
>> project to work with the Windows profiler :/ Was currently looking
>> into that.
>>
>> Cheers,
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2019-05-17 16:51:
>>> On 17/05/2019 11:26, Maurizio Cimadamore wrote:
>>>> thanks you for bringing this up, I saw this benchmark few days ago
>>>> and I took a look at it. That benchmark is unfortunately hitting on
>>>> a couple of (transitory!) pain points: (1) it is running on
>>>> Windows, which lacks the optimizations available for MacOS and
>>>> Linux (directInvoker). When the linkToNative effort will be
>>>> completed, this discrepancy between platforms will go away. The
>>>> second problem (2) is that the call is passing a big struct (e.g.
>>>> bigger than 64 bits). Even on Linux and Mac, such a call would be
>>>> unable to take advantage of the optimized invoker and would fall
>>>> back to the so called 'universal invoker' which is slow.
>>>
>>> Actually, my bad, the bench is passing pointer to structs, not structs
>>> by value - which I think should mean the 'foreign+linkToNative'
>>> experimental branch should be able to handle this. Would be nice to
>>> get some confirmation that this is indeed the case.
>>>
>>> Maurizio
More information about the panama-dev
mailing list