Vector API latency

Paul Sandoz paul.sandoz at oracle.com
Wed Jan 4 21:25:24 UTC 2023


Hi Yifan,

I will move over further replies to the panama-dev list.
 
At the moment we recommend placing the species in a static final thus the compiler (C2) can observe it as a constant. Can you try doing that as an experiment?

When running the benchmark you can use the -XX:+PrintCompilation and -XX:+PrintInlining flags to see if the vector operations get compiled.

At this location:

  https://github.com/SchrodingerZhu/SwissTableJavaVectorAPI/blob/01d6e72cd5a5cd6a38818bb5d45121ebac3c4021/src/main/java/fan/zhuyi/swisstable/SwissTable.java#L92

You do:

  converted.intoMemorySegment(MemorySegment.ofArray(control), offset, ByteOrder.nativeOrder());

Can you just do: 

  converted.intoArray(control, offset);

?

I would recommend creating a version of SwissTable without using vectorization to also compare against.

I looked a little at the code focusing on the find method. The instantiation of MaskIterator may be problematic due to allocation if escape analysis does not kick in. I recommend inlining it into the find method as another experiment. I don’t fully understand the hashing algorithm so I cannot comment on the correctness of the code.

—

We have thought a little nit about how to expose AES instructions, but not made any concrete process. They are special. For now we are focusing on more uniform vector operations.

Paul.


> On Jan 4, 2023, at 10:58 AM, Zhu, Yifan <yzhu104 at UR.Rochester.edu> wrote:
> 
> Hi,
> 
> So I did some experiments with vector API and implemented a swisstable with it. SwissTable is quite sensitive regarding to the latency during lookup fastpath. As a result, it seems that this Vector API version does not perform particularly well (Well, it seems comparable with HashMap --- faster in some workloads, slower in some more workloads, no big differences). My code is posted at https://github.com/SchrodingerZhu/SwissTableJavaVectorAPI.
> 
> I am curious about several questions:
> 
> 	• Is it possible for me to get the jited SIMD code in a handy way? (so that I can inspect the performance issue)
> 	• Apart from the operations included in current API, there are many specialized SIMD instructions such as AES/CRC/etc. Is there any plan on supporting them?
> 	• I wonder if someone can help looking throught the vector API used in my code to see if there is any room to improvement. I really hope to see if these latency sensitive SIMD data structures can work well with SIMD JIT env like JVM.
> Best,
> Yifan
> 
> 
> <Outlook-dz44tra1.png>
> Schrodinger ZHU Yifan, Ph.D. Student
> Computer Science Department, University of Rochester
> 
> Personal Email: i at zhuyi.fan
> Work Email: yifanzhu at rochester.edu
> Website: https://www.cs.rochester.edu/~yzhu104/Main.html
> Github: SchrodingerZhu
> GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3
> 
> <Outlook-2xukiglv.svg>



More information about the jdk-dev mailing list