Understanding the performance of my FFI-based API
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Mar 10 18:05:39 UTC 2023
Hi Alan,
I did some more experiment on your repository.
First of all, I fixed the keySegment benchmark utility function to do this:
|private MemorySegment getKeySegment() { final int MAX_LEN = 9; //
key100000 final int keyIdx = next(); final String keyStr = "key" +
keyIdx; for (int i = 0; i < keyStr.length(); ++i) {
keySegment.set(ValueLayout.JAVA_BYTE, i, (byte)keyStr.charAt(i)); } for
(int i = keyStr.length(); i < MAX_LEN; ++i) {
keySegment.set(ValueLayout.JAVA_BYTE, i, (byte) 0x30); } return
keySegment; } |
E.g. bring it in sync with the buffer version.
Then I made, as suggested yesterday, all MethodHandles in FFIMethod
static AND final.
Also, in FFILayout, I added a call to |.withInvokeExactBehavior()| to
each var handle creation. This is helpful to detect inexact calls. I
found few inexact calls:
* one in FFIDB.java - the result of the foreign call inside
getPinnableSlice() is casted to |Long| instead of |long|
* one in FFIPinnableSlice.java - the |isPinned()| method also casts to
|Boolean|, not |boolean|
Then, the |fromPinnable| factory contains some dubious code which is
creating a buffer from a segment, just to do a copy. I replaced with this:
|MemorySegment.copy(pinnableSlice.data(), ValueLayout.JAVA_BYTE, 0,
value, 0, (int)size); |
I’ve also updated the code to use the Java 20 API, to make sure I ran
with reasonably up to date JVM.
Before these changes, I could see a difference between FFI and JNI,
especially in the preallocated benchmark variants. With the changes
above, it looks like this here:
|Benchmark (columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode
Cnt Score Error Units GetBenchmarks.ffiGet no_column_family 1000 128
4096 thrpt 30 596.591 ± 6.448 ops/ms GetBenchmarks.ffiGet
no_column_family 1000 128 65536 thrpt 30 60.277 ± 0.547 ops/ms
GetBenchmarks.ffiGetPinnableSlice no_column_family 1000 128 4096 thrpt
30 771.631 ± 13.835 ops/ms GetBenchmarks.ffiGetPinnableSlice
no_column_family 1000 128 65536 thrpt 30 111.709 ± 1.306 ops/ms
GetBenchmarks.ffiGetRandom no_column_family 1000 128 4096 thrpt 30
591.891 ± 7.353 ops/ms GetBenchmarks.ffiGetRandom no_column_family 1000
128 65536 thrpt 30 68.197 ± 0.600 ops/ms GetBenchmarks.ffiIdentity
no_column_family 1000 128 4096 thrpt 30 58709.753 ± 712.660 ops/ms
GetBenchmarks.ffiIdentity no_column_family 1000 128 65536 thrpt 30
59265.794 ± 834.989 ops/ms GetBenchmarks.ffiPreallocatedGet
no_column_family 1000 128 4096 thrpt 30 736.686 ± 8.370 ops/ms
GetBenchmarks.ffiPreallocatedGet no_column_family 1000 128 65536 thrpt
30 101.211 ± 0.347 ops/ms GetBenchmarks.ffiPreallocatedGetRandom
no_column_family 1000 128 4096 thrpt 30 598.381 ± 6.252 ops/ms
GetBenchmarks.ffiPreallocatedGetRandom no_column_family 1000 128 65536
thrpt 30 68.037 ± 0.632 ops/ms GetBenchmarks.get no_column_family 1000
128 4096 thrpt 30 559.800 ± 3.369 ops/ms GetBenchmarks.get
no_column_family 1000 128 65536 thrpt 30 60.567 ± 0.380 ops/ms
GetBenchmarks.preallocatedByteBufferGet no_column_family 1000 128 4096
thrpt 30 758.639 ± 13.025 ops/ms GetBenchmarks.preallocatedByteBufferGet
no_column_family 1000 128 65536 thrpt 30 103.190 ± 1.219 ops/ms
GetBenchmarks.preallocatedByteBufferGetRandom no_column_family 1000 128
4096 thrpt 30 753.189 ± 12.498 ops/ms
GetBenchmarks.preallocatedByteBufferGetRandom no_column_family 1000 128
65536 thrpt 30 103.644 ± 3.625 ops/ms GetBenchmarks.preallocatedGet
no_column_family 1000 128 4096 thrpt 30 707.330 ± 10.811 ops/ms
GetBenchmarks.preallocatedGet no_column_family 1000 128 65536 thrpt 30
96.743 ± 1.609 ops/ms |
It seems most of the numbers are roughly the same.
I’m not sure how much the update to 20 matters - maybe try to fix all of
the other stuff first, and see what happens (inexact var handle calls
can be quite slow compared to Unsafe memory access).
Cheers
Maurizio
On 09/03/2023 18:13, Alan Paxton wrote:
> Hi Maurizio,
>
> Thanks for the quick and detailed response. I think our goals coincide
> as it would make life easier for rocksjava to successfully implement
> an FFI API.
>
> A couple of quick initial reruns shows me your suggestions both
> contribute a small amount of improvement, but probably do not account
> for all the performance I am missing. I shall rerun the full benchmark
> for confirmation.
>
> And since both suggestions give me a clearer idea what might be
> performance issues, I will take another pass over my code and see if I
> can spot any other potential problems in how it's implemented, or
> anything else that isn't truly like-for-like with the JNI version.
>
> --Alan
>
> On Thu, Mar 9, 2023 at 12:08 PM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com> wrote:
>
> Also, zooming into the benchmark, something funny seems to be
> going on with "getKeySegment". This seems different from the
> "getKeyArr" counterpart, but also has a new issue: I believe that,
> in JNI, you just passed the Java array "as is" - but in Panama you
> can't (as the array is on-heap), so there is some double-copying
> involved there (e.g. you create an on-heap array, which then is
> moved off-heap).
>
> If I'm not mistaken, this method is executed on every benchmark
> iteration, so the comparison doesn't just mesure the cost of the
> native call, but also the cost it takes to marshal data from Java
> heap to native.
>
> For instance, the byte buffer versions ("keyBuf") seem to avoid
> this problem by copying the data directly off-heap (by using a
> direct buffer). I think the benchmark should use a native segment,
> and avoid the copy so that at least we avoid that source of noise
> in the numbers.
>
> Cheers
> Maurizio
>
> On 09/03/2023 11:29, Maurizio Cimadamore wrote:
>> Hi Alan,
>> first of all, I'd like to thank you for taking the time to share
>> your experience and to write it all up in a document. Stuff like
>> that is very valuable to us, especially at this stage in the
>> project.
>>
>> One quick suggestion when eyeballing your code: your method
>> handles are "static", but not "final". I suggest you try to
>> sprinkle "final" in, and see whether that does the trick. If not,
>> we'd have to look deeper.
>>
>> Cheers
>> Maurizio
>>
>> On 09/03/2023 11:15, Alan Paxton wrote:
>>> Hi,
>>>
>>> I hope this is an appropriate list for this question.
>>>
>>> I have been prototyping an FFI-based version of the RocksDB Java
>>> API, which is currently implemented in JNI. RocksDB is a C++
>>> based key,value-store with a Java API layered on top. I have
>>> done some benchmarking of the FFI implementation, versus the JNI
>>> version and I find it performs consistently slightly slower than
>>> the current API.
>>>
>>> I would like to understand if this is to be expected, e.g. does
>>> FFI do more safety checking under the covers when calling a
>>> native method ?
>>> Or is the performance likely to improve between the preview in
>>> Java 19 and release in Java 21 ?
>>> If there are resources or suggestions that would help me dig
>>> into the performance I'd be very grateful to be pointed to them.
>>>
>>> For the use case I'm measuring, data is transferred in native
>>> memory originally allocated by RocksDB in C++ which I wrap as a
>>> MemorySegment; I do allocate native memory for the request
>>> structure.
>>>
>>> These are links to the PR and some documentation of the work:
>>>
>>> https://github.com/facebook/rocksdb/pull/11095
>>> <https://urldefense.com/v3/__https://github.com/facebook/rocksdb/pull/11095__;!!ACWV5N9M2RV99hQ!PM3HGf9CTDeNF5zsB_t5qffUH17pmZ2W8psJF6ewjgUHDJnrxu60CgJnhOr3DF3lPl6YPKe-nib38M3LwP3O-57EKB8O$>
>>>
>>> https://github.com/alanpaxton/rocksdb/blob/eb-1680-panama-ffi/java/JavaFFI.md
>>> <https://urldefense.com/v3/__https://github.com/alanpaxton/rocksdb/blob/eb-1680-panama-ffi/java/JavaFFI.md__;!!ACWV5N9M2RV99hQ!PM3HGf9CTDeNF5zsB_t5qffUH17pmZ2W8psJF6ewjgUHDJnrxu60CgJnhOr3DF3lPl6YPKe-nib38M3LwP3O-5SkpJEU$>
>>>
>>>
>>> Many thanks,
>>> Alan Paxton
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230310/febba0c5/attachment-0001.htm>
More information about the panama-dev
mailing list