RFR: 8320448: Accelerate IndexOf using AVX2 [v33]
Daniel Jeliński
djelinski at openjdk.org
Fri May 24 06:34:10 UTC 2024
On Thu, 23 May 2024 19:26:10 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268:
>>
>>> 266: __ cmpq(needle_len_p, 0);
>>> 267: __ jg_b(L_nextCheck);
>>> 268: __ xorq(rax, rax);
>>
>> out of curiosity, is there any advantage to using `xorq` instead of `xorl` here?
>>
>> https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now.
>
> Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed.
Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree.
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449:
>>
>>> 447: __ cmpq(r13, NUMBER_OF_CASES - 1);
>>> 448: __ ja(L_smallCaseDefault);
>>> 449: __ mov64(r15, (int64_t)small_jump_table);
>>
>> would it make sense to use `lea` here?
>
> It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation.
the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested)
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803:
>>
>>> 801: __ movq(index, needle_len);
>>> 802: __ andq(index, 0xf); // nLen % 16
>>> 803: __ movq(offset, 0x10);
>>
>> `movl` or `movptr` would produce a shorter encoding
>
> I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register. I feel that changing this would ripple through the code. @sviswa7 what do you think?
Right, that makes sense. I wonder if there's any reason why the logic to select the best mov variant is in movptr, and not in movq. Usually the `ptr` functions just select the `l` or `q` overload depending on the target system, `movptr` is an exception here.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612907959
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908115
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908219
More information about the core-libs-dev
mailing list