Vector API latency
Paul Sandoz
paul.sandoz at oracle.com
Wed Jan 4 21:25:24 UTC 2023
Hi Yifan,
I will move over further replies to the panama-dev list.
At the moment we recommend placing the species in a static final thus the compiler (C2) can observe it as a constant. Can you try doing that as an experiment?
When running the benchmark you can use the -XX:+PrintCompilation and -XX:+PrintInlining flags to see if the vector operations get compiled.
At this location:
https://github.com/SchrodingerZhu/SwissTableJavaVectorAPI/blob/01d6e72cd5a5cd6a38818bb5d45121ebac3c4021/src/main/java/fan/zhuyi/swisstable/SwissTable.java#L92
You do:
converted.intoMemorySegment(MemorySegment.ofArray(control), offset, ByteOrder.nativeOrder());
Can you just do:
converted.intoArray(control, offset);
?
I would recommend creating a version of SwissTable without using vectorization to also compare against.
I looked a little at the code focusing on the find method. The instantiation of MaskIterator may be problematic due to allocation if escape analysis does not kick in. I recommend inlining it into the find method as another experiment. I don’t fully understand the hashing algorithm so I cannot comment on the correctness of the code.
—
We have thought a little nit about how to expose AES instructions, but not made any concrete process. They are special. For now we are focusing on more uniform vector operations.
Paul.
> On Jan 4, 2023, at 10:58 AM, Zhu, Yifan <yzhu104 at UR.Rochester.edu> wrote:
>
> Hi,
>
> So I did some experiments with vector API and implemented a swisstable with it. SwissTable is quite sensitive regarding to the latency during lookup fastpath. As a result, it seems that this Vector API version does not perform particularly well (Well, it seems comparable with HashMap --- faster in some workloads, slower in some more workloads, no big differences). My code is posted at https://github.com/SchrodingerZhu/SwissTableJavaVectorAPI.
>
> I am curious about several questions:
>
> • Is it possible for me to get the jited SIMD code in a handy way? (so that I can inspect the performance issue)
> • Apart from the operations included in current API, there are many specialized SIMD instructions such as AES/CRC/etc. Is there any plan on supporting them?
> • I wonder if someone can help looking throught the vector API used in my code to see if there is any room to improvement. I really hope to see if these latency sensitive SIMD data structures can work well with SIMD JIT env like JVM.
> Best,
> Yifan
>
>
> <Outlook-dz44tra1.png>
> Schrodinger ZHU Yifan, Ph.D. Student
> Computer Science Department, University of Rochester
>
> Personal Email: i at zhuyi.fan
> Work Email: yifanzhu at rochester.edu
> Website: https://www.cs.rochester.edu/~yzhu104/Main.html
> Github: SchrodingerZhu
> GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3
>
> <Outlook-2xukiglv.svg>
More information about the jdk-dev
mailing list