<div dir="ltr">Hi Maurizio,<div><br></div><div>Thanks very much for taking the time to work through all that. I have now made the changes you suggested and like you I am seeing results that are comparable between JNI and FFI. I will update my code/document to reflect this asap..</div><div><br></div><div>A few thoughts on what I have learned</div><div>1. The importance of exact FFI calls to VarHandles, and the usefulness of the <span style="background-color:rgb(248,248,248);font-family:Consolas,Inconsolata,Courier,monospace;font-size:11.05px;white-space:pre-wrap">.withInvokeExactBehavior() </span>for tracking these down.</div><div>2. Good old "make it final if you possibly can.."</div><div>3. I had missed MemorySegment.copy(...) as the way to do the efficient memcpy out of a native segment, hence my ugly and inefficient attempt to wrap it in a ByteBuffer</div><div>4. Not allocating objects is always the most efficient thing to do</div><div><br></div><div>You might be able to point me at something that explains what goes on under the cover of invocation, and why exact matters ? My overall takeaway is that there are a number of rules of thumb for making use of FFI fast, if you follow them you get equivalent performance to JNI, with safety for free.</div><div><br></div><div>--Alan</div><div><br></div><div>"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: columnFamilyTestType","Param: keyCount","Param: keySize","Param: valueSize"<br>"org.rocksdb.jmh.GetBenchmarks.ffiGet","thrpt",1,5,18964.735543,853.481052,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetOutputSlice","thrpt",1,5,25157.246026,69.076723,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetPinnableSlice","thrpt",1,5,28124.236270,1087.581497,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiGetRandom","thrpt",1,5,18130.128894,1212.912411,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiIdentity","thrpt",1,5,35359992.737450,294600.268777,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGet","thrpt",1,5,24029.388397,937.110620,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.ffiPreallocatedGetRandom","thrpt",1,5,23228.230564,1926.594037,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.get","thrpt",1,5,19458.822466,755.447304,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGet","thrpt",1,5,25178.037840,310.913780,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedByteBufferGetRandom","thrpt",1,5,24022.235825,622.782684,"ops/s",no_column_family,100000,128,65536<br>"org.rocksdb.jmh.GetBenchmarks.preallocatedGet","thrpt",1,5,25117.231538,1259.112187,"ops/s",no_column_family,100000,128,65536<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 10, 2023 at 6:46 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com">maurizio.cimadamore@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  
  <div>
    <p><br>
    </p>
    <div>On 10/03/2023 18:05, Maurizio
      Cimadamore wrote:<br>
    </div>
    <blockquote type="cite">
      <p style="margin:0px 0px 1.2em">I’m not sure how much
        the update to 20 matters - maybe try to fix all of the other
        stuff first, and see what happens (inexact var handle calls can
        be quite slow compared to Unsafe memory access).</p>
    </blockquote>
    <p>I reverted the Java 20 changes. Numbers still looking good:<br>
      <br>
      ```<br>
      Benchmark                                     
      (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode 
      Cnt      Score     Error   Units<br>
      GetBenchmarks.ffiGet                                
      no_column_family        1000        128         4096  thrpt  
      30    596.329 ±   9.452  ops/ms<br>
      GetBenchmarks.ffiGet                                
      no_column_family        1000        128        65536  thrpt  
      30     60.368 ±   0.842  ops/ms<br>
      GetBenchmarks.ffiGetPinnableSlice                   
      no_column_family        1000        128         4096  thrpt  
      30    752.036 ±   5.655  ops/ms<br>
      GetBenchmarks.ffiGetPinnableSlice                   
      no_column_family        1000        128        65536  thrpt  
      30    111.105 ±   2.304  ops/ms<br>
      GetBenchmarks.ffiGetRandom                          
      no_column_family        1000        128         4096  thrpt  
      30    582.699 ±   3.379  ops/ms<br>
      GetBenchmarks.ffiGetRandom                          
      no_column_family        1000        128        65536  thrpt  
      30     64.546 ±   1.829  ops/ms<br>
      GetBenchmarks.ffiIdentity                           
      no_column_family        1000        128         4096  thrpt   30 
      57239.625 ± 674.849  ops/ms<br>
      GetBenchmarks.ffiIdentity                           
      no_column_family        1000        128        65536  thrpt   30 
      57802.683 ± 589.983  ops/ms<br>
      GetBenchmarks.ffiPreallocatedGet                    
      no_column_family        1000        128         4096  thrpt  
      30    717.237 ±   8.434  ops/ms<br>
      GetBenchmarks.ffiPreallocatedGet                    
      no_column_family        1000        128        65536  thrpt  
      30     96.223 ±   1.143  ops/ms<br>
      GetBenchmarks.ffiPreallocatedGetRandom              
      no_column_family        1000        128         4096  thrpt  
      30    585.284 ±   5.415  ops/ms<br>
      GetBenchmarks.ffiPreallocatedGetRandom              
      no_column_family        1000        128        65536  thrpt  
      30     66.568 ±   0.843  ops/ms<br>
      GetBenchmarks.get                                   
      no_column_family        1000        128         4096  thrpt  
      30    553.515 ±   6.278  ops/ms<br>
      GetBenchmarks.get                                   
      no_column_family        1000        128        65536  thrpt  
      30     59.999 ±   0.935  ops/ms<br>
      GetBenchmarks.preallocatedByteBufferGet             
      no_column_family        1000        128         4096  thrpt  
      30    738.077 ±   8.767  ops/ms<br>
      GetBenchmarks.preallocatedByteBufferGet             
      no_column_family        1000        128        65536  thrpt  
      30     99.239 ±   1.398  ops/ms<br>
      GetBenchmarks.preallocatedByteBufferGetRandom       
      no_column_family        1000        128         4096  thrpt  
      30    722.680 ±  11.499  ops/ms<br>
      GetBenchmarks.preallocatedByteBufferGetRandom       
      no_column_family        1000        128        65536  thrpt  
      30    110.411 ±   1.117  ops/ms<br>
      GetBenchmarks.preallocatedGet                       
      no_column_family        1000        128         4096  thrpt  
      30    700.405 ±   8.534  ops/ms<br>
      GetBenchmarks.preallocatedGet                       
      no_column_family        1000        128        65536  thrpt  
      30     99.694 ±   2.122  ops/ms<br>
      ```<br>
    </p>
    <p>Maurizio<br>
    </p>
  </div>

</blockquote></div>