RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

Daniel Jeliński djelinski at openjdk.org
Fri May 24 06:34:10 UTC 2024


On Thu, 23 May 2024 19:26:10 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268:
>> 
>>> 266:   __ cmpq(needle_len_p, 0);
>>> 267:   __ jg_b(L_nextCheck);
>>> 268:   __ xorq(rax, rax);
>> 
>> out of curiosity, is there any advantage to using `xorq` instead of `xorl` here?
>> 
>> https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now.
>
> Thanks for finding this.  It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't.  Fixed.

Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree.

>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449:
>> 
>>> 447:   __ cmpq(r13, NUMBER_OF_CASES - 1);
>>> 448:   __ ja(L_smallCaseDefault);
>>> 449:   __ mov64(r15, (int64_t)small_jump_table);
>> 
>> would it make sense to use `lea` here?
>
> It may, but I believe the movq is shorter (although maybe not to r15).  I'll do some experimentation.

the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested)

>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803:
>> 
>>> 801:     __ movq(index, needle_len);
>>> 802:     __ andq(index, 0xf);  // nLen % 16
>>> 803:     __ movq(offset, 0x10);
>> 
>> `movl` or `movptr` would produce a shorter encoding
>
> I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register.  I feel that changing this would ripple through the code.  @sviswa7 what do you think?

Right, that makes sense. I wonder if there's any reason why the logic to select the best mov variant is in movptr, and not in movq. Usually the `ptr` functions just select the `l` or `q` overload depending on the target system, `movptr` is an exception here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612907959
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908115
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908219


More information about the core-libs-dev mailing list