Follow-up results for SwissTable with Vector API
Paul Sandoz
paul.sandoz at oracle.com
Thu Jan 5 16:29:29 UTC 2023
Hi,
I saw you sent another email prior to this, but for some reason it got lost by the moderation system. (Since you are not a member of the list the emails need to be moderated and approved.)
> On Jan 5, 2023, at 8:09 AM, Zhu, Yifan <yzhu104 at UR.Rochester.edu> wrote:
>
> This is the following up message for https://mail.openjdk.org/pipermail/jdk-dev/2023-January/007288.html.
>
> > You do:
> > converted.intoMemorySegment(MemorySegment.ofArray(control), offset, ByteOrder.nativeOrder());
> >
> > Can you just do:
> >
> > converted.intoArray(control, offset);
>
>
> I did so because I found that Vector<Byte> actually does not have that method.
Ah, yes. There could be an perf issue with memory segment access, although since you had to wrap the array in a segment there will be some cost to that. It’s like if you wrapped the control array in a segment and stored in a field it would work better.
> After your suggestion, I switched to use ByteVector instead by Vector<Byte>. Surprisingly, this time the hashmap delivers a better performance. It 2~3 times faster during the insertion procedure.
Good!
> However, there was still a performance gap behind the standard hashmap during finding precedure.
>
> For the ease of discussion, I attach the relevant code here:
>
> private int findWithHash(long hash, K key) {
> byte h2 = Util.h2(hash); //highest 7 bits
> int position = Util.h1(hash) & bucketMask; // h1 is just long to int
> int stride = 0;
> while (true) {
> var mask = matchByte(position, h2).toLong(); // match byte is to load a vector of byte and do equality comparison
> while (MaskIterator.hasNext(mask)) {
> var bit = MaskIterator.getNext(mask);
> mask = MaskIterator.moveNext(mask);
> var index = (position + bit) & bucketMask;
> if (key.equals(keys[index])) return index;
> }
>
> if (matchEmpty(position).anyTrue()) {
> return -1;
> }
>
> stride += VECTOR_LENGTH;
> position = (position + stride) & bucketMask;
> }
> }
> From Intellij IDEA's profiler, it seems that a large portion of time is spent on building the vectormask. I see there is an underlying bTest operation converting the results to boolean array and then give the mask. Will this be internally optimized to a single movemask operation by JVM?
>
Can you get an inline/compilation trace like you did for insert?
The VectorMask.toLong method is an intrinsic method.
Try:
var vmask = matchByte(position, h2);
var mask = mask.toLong();
Probably will not make any difference, but if the findIInsertSlot performed ok operating on the mask returned from matchEmptyOrDelete it points to an issue with VectorMask.toLong.
Paul.
>
> <Outlook-ejiaczyb.png>
> Schrodinger ZHU Yifan, Ph.D. Student
> Computer Science Department, University of Rochester
>
> Personal Email: i at zhuyi.fan
> Work Email: yifanzhu at rochester.edu
> Website: https://www.cs.rochester.edu/~yzhu104/Main.html
> Github: SchrodingerZhu
> GPG Fingerprint: BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3
>
> <Outlook-3nrq0klq.svg>
More information about the panama-dev
mailing list