<div dir="ltr">Hi Maurizio,<div><br></div><div>Thanks very much for taking the time to work through all that. I have now made the changes you suggested and like you I am seeing results that are comparable between JNI and FFI. I will update my code/document to reflect this asap..</div><div><br></div><div>A few thoughts on what I have learned</div><div>1. The importance of exact FFI calls to VarHandles, and the usefulness of the <span style="background-color:rgb(248,248,248);font-family:Consolas,Inconsolata,Courier,monospace;font-size:11.05px;white-space:pre-wrap">.withInvokeExactBehavior() </span>for tracking these down.</div><div>2. Good old "make it final if you possibly can.."</div><div>3. I had missed MemorySegment.copy(...) as the way to do the efficient memcpy out of a native segment, hence my ugly and inefficient attempt to wrap it in a ByteBuffer</div><div>4. Not allocating objects is always the most efficient thing to do</div><div><br></div><div>You might be able to point me at something that explains what goes on under the cover of invocation, and why exact matters ? My overall takeaway is that there are a number of rules of thumb for making use of FFI fast, if you follow them you get equivalent performance to JNI, with safety for free.</div><div><br></div><div>--Alan</div><div><br></div><div>"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: columnFamilyTestType","Param: keyCount","Param: keySize","Param: valueSize"<br>"org.rocksdb.jmh.GetBenchmarks.ffiGet","thrpt",1,5,18964.735543,853.481052,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetOutputSlice","thrpt",1,5,25157.246026,69.076723,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetPinnableSlice","thrpt",1,5,28124.236270,1087.581497,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetRandom","thrpt",1,5,18130.128894,1212.912411,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiIdentity","thrpt",1,5,35359992.737450,294600.268777,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGet","thrpt",1,5,24029.388397,937.110620,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGetRandom","thrpt",1,5,23228.230564,1926.594037,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.get","thrpt",1,5,19458.822466,755.447304,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGet","thrpt",1,5,25178.037840,310.913780,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGetRandom","thrpt",1,5,24022.235825,622.782684,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedGet","thrpt",1,5,25117.231538,1259.112187,"ops/s",no_column_family,100000,128,65536<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 10, 2023 at 6:46 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com">maurizio.cimadamore@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 10/03/2023 18:05, Maurizio
Cimadamore wrote:<br>
</div>
<blockquote type="cite">
<p style="margin:0px 0px 1.2em">I’m not sure how much
the update to 20 matters - maybe try to fix all of the other
stuff first, and see what happens (inexact var handle calls can
be quite slow compared to Unsafe memory access).</p>
</blockquote>
<p>I reverted the Java 20 changes. Numbers still looking good:<br>
<br>
```<br>
Benchmark
(columnFamilyTestType) (keyCount) (keySize) (valueSize) Mode
Cnt Score Error Units<br>
GetBenchmarks.ffiGet
no_column_family 1000 128 4096 thrpt
30 596.329 ± 9.452 ops/ms<br>
GetBenchmarks.ffiGet
no_column_family 1000 128 65536 thrpt
30 60.368 ± 0.842 ops/ms<br>
GetBenchmarks.ffiGetPinnableSlice
no_column_family 1000 128 4096 thrpt
30 752.036 ± 5.655 ops/ms<br>
GetBenchmarks.ffiGetPinnableSlice
no_column_family 1000 128 65536 thrpt
30 111.105 ± 2.304 ops/ms<br>
GetBenchmarks.ffiGetRandom
no_column_family 1000 128 4096 thrpt
30 582.699 ± 3.379 ops/ms<br>
GetBenchmarks.ffiGetRandom
no_column_family 1000 128 65536 thrpt
30 64.546 ± 1.829 ops/ms<br>
GetBenchmarks.ffiIdentity
no_column_family 1000 128 4096 thrpt 30
57239.625 ± 674.849 ops/ms<br>
GetBenchmarks.ffiIdentity
no_column_family 1000 128 65536 thrpt 30
57802.683 ± 589.983 ops/ms<br>
GetBenchmarks.ffiPreallocatedGet
no_column_family 1000 128 4096 thrpt
30 717.237 ± 8.434 ops/ms<br>
GetBenchmarks.ffiPreallocatedGet
no_column_family 1000 128 65536 thrpt
30 96.223 ± 1.143 ops/ms<br>
GetBenchmarks.ffiPreallocatedGetRandom
no_column_family 1000 128 4096 thrpt
30 585.284 ± 5.415 ops/ms<br>
GetBenchmarks.ffiPreallocatedGetRandom
no_column_family 1000 128 65536 thrpt
30 66.568 ± 0.843 ops/ms<br>
GetBenchmarks.get
no_column_family 1000 128 4096 thrpt
30 553.515 ± 6.278 ops/ms<br>
GetBenchmarks.get
no_column_family 1000 128 65536 thrpt
30 59.999 ± 0.935 ops/ms<br>
GetBenchmarks.preallocatedByteBufferGet
no_column_family 1000 128 4096 thrpt
30 738.077 ± 8.767 ops/ms<br>
GetBenchmarks.preallocatedByteBufferGet
no_column_family 1000 128 65536 thrpt
30 99.239 ± 1.398 ops/ms<br>
GetBenchmarks.preallocatedByteBufferGetRandom
no_column_family 1000 128 4096 thrpt
30 722.680 ± 11.499 ops/ms<br>
GetBenchmarks.preallocatedByteBufferGetRandom
no_column_family 1000 128 65536 thrpt
30 110.411 ± 1.117 ops/ms<br>
GetBenchmarks.preallocatedGet
no_column_family 1000 128 4096 thrpt
30 700.405 ± 8.534 ops/ms<br>
GetBenchmarks.preallocatedGet
no_column_family 1000 128 65536 thrpt
30 99.694 ± 2.122 ops/ms<br>
```<br>
</p>
<p>Maurizio<br>
</p>
</div>
</blockquote></div>