Understanding the performance of my FFI-based API
Alan Paxton
alan.paxton at gmail.com
Tue Mar 14 11:06:46 UTC 2023
Hi Maurizio,
Thanks very much for taking the time to work through all that. I have now
made the changes you suggested and like you I am seeing results that are
comparable between JNI and FFI. I will update my code/document to reflect
this asap..
A few thoughts on what I have learned
1. The importance of exact FFI calls to VarHandles, and the usefulness of
the .withInvokeExactBehavior() for tracking these down.
2. Good old "make it final if you possibly can.."
3. I had missed MemorySegment.copy(...) as the way to do the efficient
memcpy out of a native segment, hence my ugly and inefficient attempt to
wrap it in a ByteBuffer
4. Not allocating objects is always the most efficient thing to do
You might be able to point me at something that explains what goes on under
the cover of invocation, and why exact matters ? My overall takeaway is
that there are a number of rules of thumb for making use of FFI fast, if
you follow them you get equivalent performance to JNI, with safety for free.
--Alan
"Benchmark","Mode","Threads","Samples","Score","Score Error
(99.9%)","Unit","Param: columnFamilyTestType","Param: keyCount","Param:
keySize","Param: valueSize"
"org.rocksdb.jmh.GetBenchmarks.ffiGet","thrpt",1,5,18964.735543,853.481052,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiGetOutputSlice","thrpt",1,5,25157.246026,69.076723,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiGetPinnableSlice","thrpt",1,5,28124.236270,1087.581497,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiGetRandom","thrpt",1,5,18130.128894,1212.912411,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiIdentity","thrpt",1,5,35359992.737450,294600.268777,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGet","thrpt",1,5,24029.388397,937.110620,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGetRandom","thrpt",1,5,23228.230564,1926.594037,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.get","thrpt",1,5,19458.822466,755.447304,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGet","thrpt",1,5,25178.037840,310.913780,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGetRandom","thrpt",1,5,24022.235825,622.782684,"ops/s",no_column_family,100000,128,65536
"org.rocksdb.jmh.GetBenchmarks.preallocatedGet","thrpt",1,5,25117.231538,1259.112187,"ops/s",no_column_family,100000,128,65536
On Fri, Mar 10, 2023 at 6:46 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
>
> On 10/03/2023 18:05, Maurizio Cimadamore wrote:
>
> I’m not sure how much the update to 20 matters - maybe try to fix all of
> the other stuff first, and see what happens (inexact var handle calls can
> be quite slow compared to Unsafe memory access).
>
> I reverted the Java 20 changes. Numbers still looking good:
>
> ```
> Benchmark (columnFamilyTestType)
> (keyCount) (keySize) (valueSize) Mode Cnt Score Error Units
> GetBenchmarks.ffiGet
> no_column_family 1000 128 4096 thrpt 30 596.329
> ± 9.452 ops/ms
> GetBenchmarks.ffiGet
> no_column_family 1000 128 65536 thrpt 30 60.368
> ± 0.842 ops/ms
> GetBenchmarks.ffiGetPinnableSlice
> no_column_family 1000 128 4096 thrpt 30 752.036
> ± 5.655 ops/ms
> GetBenchmarks.ffiGetPinnableSlice
> no_column_family 1000 128 65536 thrpt 30 111.105
> ± 2.304 ops/ms
> GetBenchmarks.ffiGetRandom
> no_column_family 1000 128 4096 thrpt 30 582.699
> ± 3.379 ops/ms
> GetBenchmarks.ffiGetRandom
> no_column_family 1000 128 65536 thrpt 30 64.546
> ± 1.829 ops/ms
> GetBenchmarks.ffiIdentity
> no_column_family 1000 128 4096 thrpt 30 57239.625
> ± 674.849 ops/ms
> GetBenchmarks.ffiIdentity
> no_column_family 1000 128 65536 thrpt 30 57802.683
> ± 589.983 ops/ms
> GetBenchmarks.ffiPreallocatedGet
> no_column_family 1000 128 4096 thrpt 30 717.237
> ± 8.434 ops/ms
> GetBenchmarks.ffiPreallocatedGet
> no_column_family 1000 128 65536 thrpt 30 96.223
> ± 1.143 ops/ms
> GetBenchmarks.ffiPreallocatedGetRandom
> no_column_family 1000 128 4096 thrpt 30 585.284
> ± 5.415 ops/ms
> GetBenchmarks.ffiPreallocatedGetRandom
> no_column_family 1000 128 65536 thrpt 30 66.568
> ± 0.843 ops/ms
> GetBenchmarks.get
> no_column_family 1000 128 4096 thrpt 30 553.515
> ± 6.278 ops/ms
> GetBenchmarks.get
> no_column_family 1000 128 65536 thrpt 30 59.999
> ± 0.935 ops/ms
> GetBenchmarks.preallocatedByteBufferGet
> no_column_family 1000 128 4096 thrpt 30 738.077
> ± 8.767 ops/ms
> GetBenchmarks.preallocatedByteBufferGet
> no_column_family 1000 128 65536 thrpt 30 99.239
> ± 1.398 ops/ms
> GetBenchmarks.preallocatedByteBufferGetRandom
> no_column_family 1000 128 4096 thrpt 30 722.680
> ± 11.499 ops/ms
> GetBenchmarks.preallocatedByteBufferGetRandom
> no_column_family 1000 128 65536 thrpt 30 110.411
> ± 1.117 ops/ms
> GetBenchmarks.preallocatedGet
> no_column_family 1000 128 4096 thrpt 30 700.405
> ± 8.534 ops/ms
> GetBenchmarks.preallocatedGet
> no_column_family 1000 128 65536 thrpt 30 99.694
> ± 2.122 ops/ms
> ```
>
> Maurizio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230314/dd19bdc2/attachment.htm>
More information about the panama-dev
mailing list