<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof"><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">Highlights of the dropped message:</span></span></span></p>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
</p>
<ol>
<li><span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">Using ByteVector instead of Vector<Byte> gives better performance</span></span></span></li><li><span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">ByteVector::lt(byte) is not optimized for loop invariant values generating a boardcase
 on each iteration, so I cache a final value for loop, and it improves the performance</span></span></span></li><li><span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">For swisstable on x86, AVX2/AVX512 has a slightly lower peak thrput due to higher latency.
 (thou the paralism of wider register is better)Si</span></span></span></li></ol>
<p></p>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">--------------------------------------------------------------------------------------</span></span></span></p>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-pair-s md-expand" style="box-sizing:border-box"><span style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">I finall get time to run the full benchmark. Here are current results:<br>
</span></span></span></p>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-pair-s md-expand" style="box-sizing:border-box"><strong style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">Update:</span></strong></span><span class="md-plain ContentPasted1" style="box-sizing:border-box"> As
 suggested, I added a generic implementation without vector API to do comparison.</span></p>
<p class="md-end-block md-p md-focus FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain md-expand ContentPasted1" style="box-sizing:border-box">JVM info</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1"># VM version: JDK 19.0.1, Java HotSpot(TM) 64-Bit Server VM, 19.0.1+10-21</span></pre>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain ContentPasted1" style="box-sizing:border-box">Server Info</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Architecture:            x86_64</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  CPU op-mode(s):        32-bit, 64-bit</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Address sizes:         48 bits physical, 48 bits virtual</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Byte Order:            Little Endian</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">CPU(s):                  128</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  On-line CPU(s) list:   0-127</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Vendor ID:               AuthenticAMD</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Model name:            AMD EPYC 7773X 64-Core Processor</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    CPU family:          25</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Model:               1</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Thread(s) per core:  2</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Core(s) per socket:  64</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Socket(s):           1</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Stepping:            2</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Frequency boost:     enabled</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    CPU(s) scaling MHz:  64%</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    CPU max MHz:         3527.7339</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    CPU min MHz:         1500.0000</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    BogoMIPS:            4399.93</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">                         aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skini</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">                         t wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx sm</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">                         ap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">                         vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Virtualization features: </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Virtualization:        AMD-V</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Caches (sum of all):     </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  L1d:                   2 MiB (64 instances)</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  L1i:                   2 MiB (64 instances)</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  L2:                    32 MiB (64 instances)</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  L3:                    768 MiB (8 instances)</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">NUMA:                    </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  NUMA node(s):          1</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  NUMA node0 CPU(s):     0-127</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Vulnerabilities:         </span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Itlb multihit:         Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  L1tf:                  Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Mds:                   Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Meltdown:              Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Mmio stale data:       Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Retbleed:              Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Srbds:                 Not affected</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">  Tsx async abort:       Not affected</span></pre>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain ContentPasted1" style="box-sizing:border-box">Random Operation (Three hashmap implementations execute a same randomly generated sequence of insert/find/remove operations. The sequence length is 100000):</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Benchmark                                           Mode  Cnt    Score    Error  Units</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">RandomOperations.genericSwissTableRandomOperation  thrpt    5  264.657 ±  2.645  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">RandomOperations.hashMapRandomOperation            thrpt    5  280.280 ± 11.416  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">RandomOperations.swissTableRandomOperation         thrpt    5  304.007 ±  6.635  ops/s</span></pre>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain ContentPasted1" style="box-sizing:border-box">Find Operation (Three hashmap implementations find the same generated data sequence. The string keys are around 190 bytes. SwissTables are using WyHash as hasher. The sequence length is 100000)</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Benchmark                                           Mode  Cnt     Score     Error  Units</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.genericSwissTableFindExistingLong    thrpt    5   440.841 ±   2.525  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.genericSwissTableFindExistingString  thrpt    5   500.632 ±   9.925  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.genericSwissTableFindMissingLong     thrpt    5   852.220 ±   3.159  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.genericSwissTableFindMissingString   thrpt    5  1024.492 ±   6.017  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.hashTableFindExistingLong            thrpt    5   848.031 ±   5.488  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.hashTableFindExistingString          thrpt    5  1247.213 ±  11.066  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.hashTableFindMissingLong             thrpt    5   884.276 ±   0.983  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.hashTableFindMissingString           thrpt    5  1386.718 ±  72.998  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.swissTableFindExistingLong           thrpt    5   717.106 ±   1.653  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.swissTableFindExistingString         thrpt    5   689.134 ±  11.573  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.swissTableFindMissingLong            thrpt    5  1143.098 ±   5.033  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">FindBenchmark.swissTableFindMissingString          thrpt    5  1562.995 ± 893.966  ops/s</span></pre>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-pair-s" style="box-sizing:border-box"><strong style="box-sizing:border-box"><span class="md-plain ContentPasted1" style="box-sizing:border-box">Notice: I am not sure why java.util.HashMap performs better when finding existing keys, is there
 any specialization when JVM sees that the finding sequence is the same as the insertion sequence?</span></strong></span></p>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain ContentPasted1" style="box-sizing:border-box">Insert Operation (Three hashmap implementations insert the same generated data sequence. The string keys are around 190 bytes. For string, all three tables use cached WyHash value (overloaded
 method). For long integer, the identity hash is used. The sequence length is 100000)</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Benchmark                                             Mode  Cnt    Score    Error  Units</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.genericSwissTableLongInsertion    thrpt    5  251.994 ±  6.383  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.genericSwissTableStringInsertion  thrpt    5  242.455 ±  8.091  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.hashMapLongInsertion              thrpt    5  157.068 ± 10.528  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.hashMapStringInsertion            thrpt    5  157.860 ±  5.385  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.swissTableLongInsertion           thrpt    5  281.914 ±  3.051  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">InsertionBenchmark.swissTableStringInsertion         thrpt    5  265.384 ±  2.884  ops/s</span></pre>
<p class="md-end-block md-p FluidPluginCopy" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
<span class="md-plain ContentPasted1" style="box-sizing:border-box">Iteration (Iterate through the whole map using iterator)</span></p>
<pre class="md-fences md-end-block ty-contain-cm modeLoaded FluidPluginCopy" spellcheck="false" style="box-sizing:border-box;font-size:0.9em;display:block;break-inside:avoid;text-align:left;background-color:rgb(248, 248, 248);border:1px solid rgb(231, 234, 237);border-radius:3px;padding:8px 4px 6px;margin-bottom:15px;margin-top:15px;color:rgb(51, 51, 51);orphans:2;widows:2" lang=""><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Benchmark                              Mode  Cnt     Score      Error  Units</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Iteration.genericSwissTableIteration  thrpt    5  1615.441 ±   44.650  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Iteration.hashMapIteration            thrpt    5   562.280 ±    3.662  ops/s</span><br class="ContentPasted1"><span style="box-sizing:border-box;padding-right:0.1px" class="ContentPasted1">Iteration.swissTableIteration         thrpt    5  2667.606 ± 1454.997  ops/s</span></pre>
<p class="md-end-block md-p FluidPluginCopy ContentPasted1" style="box-sizing:border-box;orphans:4;margin:0.8em 0px;color:rgb(51, 51, 51);font-family:"Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;widows:2">
</p>
<br>
</span></div>
<div class="elementToProof">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature">
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="ContentPasted0">
<img style="width: 119.816px; height: 29px; max-width: initial;" width="119" height="29" data-outlook-trace="F:1|T:1" src="cid:178fa4da-cb40-4f72-ada8-2546493562f3"><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="ContentPasted0">
<span style="font-family: "Calibri Light", "Helvetica Light", sans-serif;">Schrodinger ZHU Yifan, Ph.D. Student</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="ContentPasted0">
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif;">Computer Science Department, University of Rochester</span></div>
<div class="ContentPasted0"><br>
</div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"><b>Personal Email:</b></span><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"> i@zhuyi.fan</span></div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"><b>Work Email:</b></span><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"> yifanzhu@rochester.edu</span></div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"><b>Website:</b></span><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"> https://www.cs.rochester.edu/~yzhu104/Main.html</span></div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"><b>Github:</b></span><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"> SchrodingerZhu</span></div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"><b>GPG Fingerprint:</b></span><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 10pt;"> BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3</span></div>
<div class="ContentPasted0"><span style="font-family: "Calibri Light", "Helvetica Light", sans-serif;"><br>
</span></div>
<img style="width: 139px; height: 29px; max-width: initial;" width="139" height="29" data-outlook-trace="F:1|T:1" src="cid:b8a5c99e-343f-4275-8e2f-1084525d9b41"><br>
</div>
</div>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>发件人:</b> Gavin Ray <ray.gavin97@gmail.com><br>
<b>发送时间:</b> 2023年1月7日 23:28<br>
<b>收件人:</b> Paul Sandoz <paul.sandoz@oracle.com><br>
<b>抄送:</b> Zhu, Yifan <yzhu104@UR.Rochester.edu>; panama-dev@openjdk.org <panama-dev@openjdk.org><br>
<b>主题:</b> Re: [EXT] Follow-up results for SwissTable with Vector API</font>
<div> </div>
</div>
<div>
<div dir="ltr">Zhu I'm very interested in this discussion, in the event there were mails that were dropped, FWIW
<div>A SwissTable implementation based on Vector intrinsics + FFM API would be super useful for a lot of applications.<br>
<div><br>
</div>
<div>This is the history that I see:</div>
<div><br>
</div>
<div><img alt="image.png" width="562" height="201" data-outlook-trace="F:1|T:1" src="cid:ii_lcm3o1yd0"><br>
</div>
</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Fri, Jan 6, 2023 at 11:32 AM Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com">paul.sandoz@oracle.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
In some further replies I just noticed you dropped the panama-dev email. Resend a summary of the discussion?<br>
<br>
Paul.<br>
<br>
> On Jan 5, 2023, at 2:16 PM, Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>> wrote:<br>
> <br>
> I am confused. It seems that my replies are detached from the mailling list. It that expected?<br>
> <br>
> <br>
> <Outlook-3cjuahvq.png><br>
> Schrodinger ZHU Yifan, Ph.D. Student<br>
> Computer Science Department, University of Rochester<br>
> <br>
> Personal Email: i@zhuyi.fan<br>
> Work Email: <a href="mailto:yifanzhu@rochester.edu" target="_blank">yifanzhu@rochester.edu</a><br>
> Website: <a href="https://www.cs.rochester.edu/~yzhu104/Main.html" rel="noreferrer" target="_blank">
https://www.cs.rochester.edu/~yzhu104/Main.html</a><br>
> Github: SchrodingerZhu<br>
> GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3<br>
> <br>
> <Outlook-fjqcbbcv.svg><br>
> 发件人: panama-dev <<a href="mailto:panama-dev-retn@openjdk.org" target="_blank">panama-dev-retn@openjdk.org</a>> 代表 Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com" target="_blank">paul.sandoz@oracle.com</a>><br>
> 发送时间: 2023年1月6日 0:29<br>
> 收件人: Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>><br>
> 抄送: <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a> <<a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a>><br>
> 主题: [EXT] Re: Follow-up results for SwissTable with Vector API<br>
>  <br>
> Hi,<br>
> <br>
> I saw you sent another email prior to this, but for some reason it got lost by the moderation system. (Since you are not a member of the list the emails need to be moderated and approved.)<br>
>  <br>
> <br>
> > On Jan 5, 2023, at 8:09 AM, Zhu, Yifan <<a href="mailto:yzhu104@UR.Rochester.edu" target="_blank">yzhu104@UR.Rochester.edu</a>> wrote:<br>
> > <br>
> > This is the following up message for <a href="https://urldefense.com/v3/__https://mail.openjdk.org/pipermail/jdk-dev/2023-January/007288.html__;!!CGUSO5OYRnA7CQ!c6-WVSHfkvXgbKEtNWhxgdZ9EHDDMbmUz9AbxpvbYN54xt_4LzTwYJd4PdHmueDBCmryWsWBXfjE-Jpw_Cfrf6WyAw$" rel="noreferrer" target="_blank">
https://urldefense.com/v3/__https://mail.openjdk.org/pipermail/jdk-dev/2023-January/007288.html__;!!CGUSO5OYRnA7CQ!c6-WVSHfkvXgbKEtNWhxgdZ9EHDDMbmUz9AbxpvbYN54xt_4LzTwYJd4PdHmueDBCmryWsWBXfjE-Jpw_Cfrf6WyAw$</a> .<br>
> > <br>
> > > You do:<br>
> > >  converted.intoMemorySegment(MemorySegment.ofArray(control), offset, ByteOrder.nativeOrder());<br>
> > ><br>
> > > Can you just do: <br>
> > ><br>
> > >  converted.intoArray(control, offset);<br>
> > <br>
> > <br>
> > I did so because I found that Vector<Byte> actually does not have that method.<br>
> <br>
> Ah, yes. There could be an perf issue with memory segment access, although since you had to wrap the array in a segment there will be some cost to that. It’s like if you wrapped the control array in a segment and stored in a field it would work better.
<br>
> <br>
> <br>
> > After your suggestion, I switched to use ByteVector instead by Vector<Byte>. Surprisingly, this time the hashmap delivers a better performance. It 2~3 times faster during the insertion procedure.
<br>
> <br>
> Good!<br>
> <br>
> <br>
> > However, there was still a performance gap behind the standard hashmap during finding precedure.<br>
> > <br>
> > For the ease of discussion, I attach the relevant code here:<br>
> > <br>
> > private int findWithHash(long hash, K key) {<br>
> >     byte h2 = Util.h2(hash); //highest 7 bits<br>
> >     int position = Util.h1(hash) & bucketMask; // h1 is just long to int<br>
> >     int stride = 0;<br>
> >     while (true) {<br>
> >         var mask = matchByte(position, h2).toLong(); // match byte is to load a vector of byte and do equality comparison<br>
> >         while (MaskIterator.hasNext(mask)) { <br>
> >             var bit = MaskIterator.getNext(mask);<br>
> >             mask = MaskIterator.moveNext(mask);<br>
> >             var index = (position + bit) & bucketMask;<br>
> >             if (key.equals(keys[index])) return index;<br>
> >         }<br>
> > <br>
> >         if (matchEmpty(position).anyTrue()) {<br>
> >             return -1;<br>
> >         }<br>
> > <br>
> >         stride += VECTOR_LENGTH;<br>
> >         position = (position + stride) & bucketMask;<br>
> >     }<br>
> > }<br>
> > From Intellij IDEA's profiler, it seems that a large portion of time is spent on building the vectormask. I see there is an underlying bTest operation converting the results to boolean array and then give the mask. Will this be internally optimized to a
 single movemask operation by JVM?<br>
> > <br>
> <br>
> Can you get an inline/compilation trace like you did for insert?<br>
> <br>
> The VectorMask.toLong method is an intrinsic method.<br>
> <br>
> Try:<br>
> <br>
>   var vmask = matchByte(position, h2);<br>
>   var mask = mask.toLong();<br>
> <br>
> Probably will not make any difference, but if the findIInsertSlot performed ok operating on the mask returned from matchEmptyOrDelete it points to an issue with VectorMask.toLong.<br>
> <br>
> Paul.<br>
>  <br>
> > <br>
> > <Outlook-ejiaczyb.png><br>
> > Schrodinger ZHU Yifan, Ph.D. Student<br>
> > Computer Science Department, University of Rochester<br>
> > <br>
> > Personal Email: i@zhuyi.fan<br>
> > Work Email: <a href="mailto:yifanzhu@rochester.edu" target="_blank">yifanzhu@rochester.edu</a><br>
> > Website: <a href="https://www.cs.rochester.edu/~yzhu104/Main.html" rel="noreferrer" target="_blank">
https://www.cs.rochester.edu/~yzhu104/Main.html</a><br>
> > Github: SchrodingerZhu<br>
> > GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3<br>
> > <br>
> > <Outlook-3nrq0klq.svg><br>
> <br>
<br>
</blockquote>
</div>
</div>
</body>
</html>