<div dir="ltr">Thanks Yifan, much appreciated!<div><br></div><div>It seems like a compelling implementation for read-heavy workloads.</div><div>I too would be curious about the reasons for performance difference between the default HashMap for other operations.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jan 7, 2023 at 2:09 PM Zhu, Yifan <<a href="mailto:yzhu104@ur.rochester.edu">yzhu104@ur.rochester.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-2536823736642205518">
<div dir="ltr">
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">Highlights of the dropped message:</span></span></span></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
</p>
<ol>
<li><span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">Using ByteVector instead of Vector<Byte> gives better performance</span></span></span></li><li><span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">ByteVector::lt(byte) is not optimized for loop invariant values generating a boardcase
on each iteration, so I cache a final value for loop, and it improves the performance</span></span></span></li><li><span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">For swisstable on x86, AVX2/AVX512 has a slightly lower peak thrput due to higher latency.
(thou the paralism of wider register is better)Si</span></span></span></li></ol>
<p></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">--------------------------------------------------------------------------------------</span></span></span></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box"><span style="box-sizing:border-box"><span style="box-sizing:border-box">I finall get time to run the full benchmark. Here are current results:<br>
</span></span></span></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box"><strong style="box-sizing:border-box"><span style="box-sizing:border-box">Update:</span></strong></span><span style="box-sizing:border-box"> As
suggested, I added a generic implementation without vector API to do comparison.</span></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">JVM info</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px"># VM version: JDK 19.0.1, Java HotSpot(TM) 64-Bit Server VM, 19.0.1+10-21</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">Server Info</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px">Architecture: x86_64</span><br><span style="box-sizing:border-box;padding-right:0.1px"> CPU op-mode(s): 32-bit, 64-bit</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Address sizes: 48 bits physical, 48 bits virtual</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Byte Order: Little Endian</span><br><span style="box-sizing:border-box;padding-right:0.1px">CPU(s): 128</span><br><span style="box-sizing:border-box;padding-right:0.1px"> On-line CPU(s) list: 0-127</span><br><span style="box-sizing:border-box;padding-right:0.1px">Vendor ID: AuthenticAMD</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Model name: AMD EPYC 7773X 64-Core Processor</span><br><span style="box-sizing:border-box;padding-right:0.1px"> CPU family: 25</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Model: 1</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Thread(s) per core: 2</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Core(s) per socket: 64</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Socket(s): 1</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Stepping: 2</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Frequency boost: enabled</span><br><span style="box-sizing:border-box;padding-right:0.1px"> CPU(s) scaling MHz: 64%</span><br><span style="box-sizing:border-box;padding-right:0.1px"> CPU max MHz: 3527.7339</span><br><span style="box-sizing:border-box;padding-right:0.1px"> CPU min MHz: 1500.0000</span><br><span style="box-sizing:border-box;padding-right:0.1px"> BogoMIPS: 4399.93</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid </span><br><span style="box-sizing:border-box;padding-right:0.1px"> aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skini</span><br><span style="box-sizing:border-box;padding-right:0.1px"> t wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx sm</span><br><span style="box-sizing:border-box;padding-right:0.1px"> ap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale </span><br><span style="box-sizing:border-box;padding-right:0.1px"> vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca</span><br><span style="box-sizing:border-box;padding-right:0.1px">Virtualization features: </span><br><span style="box-sizing:border-box;padding-right:0.1px"> Virtualization: AMD-V</span><br><span style="box-sizing:border-box;padding-right:0.1px">Caches (sum of all): </span><br><span style="box-sizing:border-box;padding-right:0.1px"> L1d: 2 MiB (64 instances)</span><br><span style="box-sizing:border-box;padding-right:0.1px"> L1i: 2 MiB (64 instances)</span><br><span style="box-sizing:border-box;padding-right:0.1px"> L2: 32 MiB (64 instances)</span><br><span style="box-sizing:border-box;padding-right:0.1px"> L3: 768 MiB (8 instances)</span><br><span style="box-sizing:border-box;padding-right:0.1px">NUMA: </span><br><span style="box-sizing:border-box;padding-right:0.1px"> NUMA node(s): 1</span><br><span style="box-sizing:border-box;padding-right:0.1px"> NUMA node0 CPU(s): 0-127</span><br><span style="box-sizing:border-box;padding-right:0.1px">Vulnerabilities: </span><br><span style="box-sizing:border-box;padding-right:0.1px"> Itlb multihit: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> L1tf: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Mds: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Meltdown: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Mmio stale data: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Retbleed: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Srbds: Not affected</span><br><span style="box-sizing:border-box;padding-right:0.1px"> Tsx async abort: Not affected</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">Random Operation (Three hashmap implementations execute a same randomly generated sequence of insert/find/remove operations. The sequence length is 100000):</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px">Benchmark Mode Cnt Score Error Units</span><br><span style="box-sizing:border-box;padding-right:0.1px">RandomOperations.genericSwissTableRandomOperation thrpt 5 264.657 ± 2.645 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">RandomOperations.hashMapRandomOperation thrpt 5 280.280 ± 11.416 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">RandomOperations.swissTableRandomOperation thrpt 5 304.007 ± 6.635 ops/s</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">Find Operation (Three hashmap implementations find the same generated data sequence. The string keys are around 190 bytes. SwissTables are using WyHash as hasher. The sequence length is 100000)</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px">Benchmark Mode Cnt Score Error Units</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.genericSwissTableFindExistingLong thrpt 5 440.841 ± 2.525 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.genericSwissTableFindExistingString thrpt 5 500.632 ± 9.925 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.genericSwissTableFindMissingLong thrpt 5 852.220 ± 3.159 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.genericSwissTableFindMissingString thrpt 5 1024.492 ± 6.017 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.hashTableFindExistingLong thrpt 5 848.031 ± 5.488 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.hashTableFindExistingString thrpt 5 1247.213 ± 11.066 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.hashTableFindMissingLong thrpt 5 884.276 ± 0.983 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.hashTableFindMissingString thrpt 5 1386.718 ± 72.998 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.swissTableFindExistingLong thrpt 5 717.106 ± 1.653 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.swissTableFindExistingString thrpt 5 689.134 ± 11.573 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.swissTableFindMissingLong thrpt 5 1143.098 ± 5.033 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">FindBenchmark.swissTableFindMissingString thrpt 5 1562.995 ± 893.966 ops/s</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box"><strong style="box-sizing:border-box"><span style="box-sizing:border-box">Notice: I am not sure why java.util.HashMap performs better when finding existing keys, is there
any specialization when JVM sees that the finding sequence is the same as the insertion sequence?</span></strong></span></p>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">Insert Operation (Three hashmap implementations insert the same generated data sequence. The string keys are around 190 bytes. For string, all three tables use cached WyHash value (overloaded
method). For long integer, the identity hash is used. The sequence length is 100000)</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px">Benchmark Mode Cnt Score Error Units</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.genericSwissTableLongInsertion thrpt 5 251.994 ± 6.383 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.genericSwissTableStringInsertion thrpt 5 242.455 ± 8.091 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.hashMapLongInsertion thrpt 5 157.068 ± 10.528 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.hashMapStringInsertion thrpt 5 157.860 ± 5.385 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.swissTableLongInsertion thrpt 5 281.914 ± 3.051 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">InsertionBenchmark.swissTableStringInsertion thrpt 5 265.384 ± 2.884 ops/s</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
<span style="box-sizing:border-box">Iteration (Iterate through the whole map using iterator)</span></p>
<pre style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248,248,248);border:1px solid rgb(231,234,237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51,51,51)" lang=""><span style="box-sizing:border-box;padding-right:0.1px">Benchmark Mode Cnt Score Error Units</span><br><span style="box-sizing:border-box;padding-right:0.1px">Iteration.genericSwissTableIteration thrpt 5 1615.441 ± 44.650 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">Iteration.hashMapIteration thrpt 5 562.280 ± 3.662 ops/s</span><br><span style="box-sizing:border-box;padding-right:0.1px">Iteration.swissTableIteration thrpt 5 2667.606 ± 1454.997 ops/s</span></pre>
<p style="box-sizing:border-box;margin:0.8em 0px;color:rgb(51,51,51);font-family:"Open Sans","Clear Sans","Helvetica Neue",Helvetica,Arial,"Segoe UI Emoji",sans-serif">
</p>
<br>
</span></div>
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_2490727210664948178Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<img style="width: 119.816px; height: 29px; max-width: initial;" width="119" height="29" src="cid:1859272efe882bd7952"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:"Calibri Light","Helvetica Light",sans-serif">Schrodinger ZHU Yifan, Ph.D. Student</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif">Computer Science Department, University of Rochester</span></div>
<div><br>
</div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"><b>Personal Email:</b></span><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"> i@zhuyi.fan</span></div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"><b>Work Email:</b></span><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"> <a href="mailto:yifanzhu@rochester.edu" target="_blank">yifanzhu@rochester.edu</a></span></div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"><b>Website:</b></span><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"> <a href="https://www.cs.rochester.edu/~yzhu104/Main.html" target="_blank">https://www.cs.rochester.edu/~yzhu104/Main.html</a></span></div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"><b>Github:</b></span><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"> SchrodingerZhu</span></div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"><b>GPG Fingerprint:</b></span><span style="font-family:"Calibri Light","Helvetica Light",sans-serif;font-size:10pt"> BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3</span></div>
<div><span style="font-family:"Calibri Light","Helvetica Light",sans-serif"><br>
</span></div>
<img style="width: 139px; height: 29px; max-width: initial;" width="139" height="29" src="cid:1859272efe81afc92873"><br>
</div>
</div>
</div>
</div>
<div id="m_2490727210664948178appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_2490727210664948178divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>发件人:</b> Gavin Ray <<a href="mailto:ray.gavin97@gmail.com" target="_blank">ray.gavin97@gmail.com</a>><br>
<b>发送时间:</b> 2023年1月7日 23:28<br>
<b>收件人:</b> Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com" target="_blank">paul.sandoz@oracle.com</a>><br>
<b>抄送:</b> Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>>; <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a> <<a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a>><br>
<b>主题:</b> Re: [EXT] Follow-up results for SwissTable with Vector API</font>
<div> </div>
</div>
<div>
<div dir="ltr">Zhu I'm very interested in this discussion, in the event there were mails that were dropped, FWIW
<div>A SwissTable implementation based on Vector intrinsics + FFM API would be super useful for a lot of applications.<br>
<div><br>
</div>
<div>This is the history that I see:</div>
<div><br>
</div>
<div><img alt="image.png" width="562" height="201" src="cid:1859272efe8cb971f161"><br>
</div>
</div>
</div>
<br>
<div>
<div dir="ltr">On Fri, Jan 6, 2023 at 11:32 AM Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com" target="_blank">paul.sandoz@oracle.com</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
In some further replies I just noticed you dropped the panama-dev email. Resend a summary of the discussion?<br>
<br>
Paul.<br>
<br>
> On Jan 5, 2023, at 2:16 PM, Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>> wrote:<br>
> <br>
> I am confused. It seems that my replies are detached from the mailling list. It that expected?<br>
> <br>
> <br>
> <Outlook-3cjuahvq.png><br>
> Schrodinger ZHU Yifan, Ph.D. Student<br>
> Computer Science Department, University of Rochester<br>
> <br>
> Personal Email: i@zhuyi.fan<br>
> Work Email: <a href="mailto:yifanzhu@rochester.edu" target="_blank">yifanzhu@rochester.edu</a><br>
> Website: <a href="https://www.cs.rochester.edu/~yzhu104/Main.html" rel="noreferrer" target="_blank">
https://www.cs.rochester.edu/~yzhu104/Main.html</a><br>
> Github: SchrodingerZhu<br>
> GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3<br>
> <br>
> <Outlook-fjqcbbcv.svg><br>
> 发件人: panama-dev <<a href="mailto:panama-dev-retn@openjdk.org" target="_blank">panama-dev-retn@openjdk.org</a>> 代表 Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com" target="_blank">paul.sandoz@oracle.com</a>><br>
> 发送时间: 2023年1月6日 0:29<br>
> 收件人: Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>><br>
> 抄送: <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a> <<a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a>><br>
> 主题: [EXT] Re: Follow-up results for SwissTable with Vector API<br>
> <br>
> Hi,<br>
> <br>
> I saw you sent another email prior to this, but for some reason it got lost by the moderation system. (Since you are not a member of the list the emails need to be moderated and approved.)<br>
> <br>
> <br>
> > On Jan 5, 2023, at 8:09 AM, Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>> wrote:<br>
> > <br>
> > This is the following up message for <a href="https://urldefense.com/v3/__https://mail.openjdk.org/pipermail/jdk-dev/2023-January/007288.html__;!!CGUSO5OYRnA7CQ!c6-WVSHfkvXgbKEtNWhxgdZ9EHDDMbmUz9AbxpvbYN54xt_4LzTwYJd4PdHmueDBCmryWsWBXfjE-Jpw_Cfrf6WyAw$" rel="noreferrer" target="_blank">
https://urldefense.com/v3/__https://mail.openjdk.org/pipermail/jdk-dev/2023-January/007288.html__;!!CGUSO5OYRnA7CQ!c6-WVSHfkvXgbKEtNWhxgdZ9EHDDMbmUz9AbxpvbYN54xt_4LzTwYJd4PdHmueDBCmryWsWBXfjE-Jpw_Cfrf6WyAw$</a> .<br>
> > <br>
> > > You do:<br>
> > > converted.intoMemorySegment(MemorySegment.ofArray(control), offset, ByteOrder.nativeOrder());<br>
> > ><br>
> > > Can you just do: <br>
> > ><br>
> > > converted.intoArray(control, offset);<br>
> > <br>
> > <br>
> > I did so because I found that Vector<Byte> actually does not have that method.<br>
> <br>
> Ah, yes. There could be an perf issue with memory segment access, although since you had to wrap the array in a segment there will be some cost to that. It’s like if you wrapped the control array in a segment and stored in a field it would work better.
<br>
> <br>
> <br>
> > After your suggestion, I switched to use ByteVector instead by Vector<Byte>. Surprisingly, this time the hashmap delivers a better performance. It 2~3 times faster during the insertion procedure.
<br>
> <br>
> Good!<br>
> <br>
> <br>
> > However, there was still a performance gap behind the standard hashmap during finding precedure.<br>
> > <br>
> > For the ease of discussion, I attach the relevant code here:<br>
> > <br>
> > private int findWithHash(long hash, K key) {<br>
> > byte h2 = Util.h2(hash); //highest 7 bits<br>
> > int position = Util.h1(hash) & bucketMask; // h1 is just long to int<br>
> > int stride = 0;<br>
> > while (true) {<br>
> > var mask = matchByte(position, h2).toLong(); // match byte is to load a vector of byte and do equality comparison<br>
> > while (MaskIterator.hasNext(mask)) { <br>
> > var bit = MaskIterator.getNext(mask);<br>
> > mask = MaskIterator.moveNext(mask);<br>
> > var index = (position + bit) & bucketMask;<br>
> > if (key.equals(keys[index])) return index;<br>
> > }<br>
> > <br>
> > if (matchEmpty(position).anyTrue()) {<br>
> > return -1;<br>
> > }<br>
> > <br>
> > stride += VECTOR_LENGTH;<br>
> > position = (position + stride) & bucketMask;<br>
> > }<br>
> > }<br>
> > From Intellij IDEA's profiler, it seems that a large portion of time is spent on building the vectormask. I see there is an underlying bTest operation converting the results to boolean array and then give the mask. Will this be internally optimized to a
single movemask operation by JVM?<br>
> > <br>
> <br>
> Can you get an inline/compilation trace like you did for insert?<br>
> <br>
> The VectorMask.toLong method is an intrinsic method.<br>
> <br>
> Try:<br>
> <br>
> var vmask = matchByte(position, h2);<br>
> var mask = mask.toLong();<br>
> <br>
> Probably will not make any difference, but if the findIInsertSlot performed ok operating on the mask returned from matchEmptyOrDelete it points to an issue with VectorMask.toLong.<br>
> <br>
> Paul.<br>
> <br>
> > <br>
> > <Outlook-ejiaczyb.png><br>
> > Schrodinger ZHU Yifan, Ph.D. Student<br>
> > Computer Science Department, University of Rochester<br>
> > <br>
> > Personal Email: i@zhuyi.fan<br>
> > Work Email: <a href="mailto:yifanzhu@rochester.edu" target="_blank">yifanzhu@rochester.edu</a><br>
> > Website: <a href="https://www.cs.rochester.edu/~yzhu104/Main.html" rel="noreferrer" target="_blank">
https://www.cs.rochester.edu/~yzhu104/Main.html</a><br>
> > Github: SchrodingerZhu<br>
> > GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3<br>
> > <br>
> > <Outlook-3nrq0klq.svg><br>
> <br>
<br>
</blockquote>
</div>
</div>
</div>
</div></blockquote></div>