RFR: 8320448: Accelerate IndexOf using AVX2 [v7]

Scott Gibbons sgibbons at openjdk.org
Wed May 22 14:53:26 UTC 2024


On Tue, 16 Jan 2024 13:26:15 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:
>> 
>>  - Merge branch 'openjdk:master' into indexof
>>  - Merge branch 'openjdk:master' into indexof
>>  - Addressing review comments.
>>  - Fix for JDK-8321599
>>  - Support UU IndexOf
>>  - Only use optimization when EnableX86ECoreOpts is true
>>  - Fix whitespace
>>  - Merge branch 'openjdk:master' into indexof
>>  - Comments; added exhaustive-ish test
>>  - Subtracting 0x10 twice.
>>  - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 417:
> 
>> 415:     __ cmpl(Address(rbx, r15, Address::times_1, -0x14), rax);
>> 416:     __ jne(L_top_loop_1);
>> 417:     __ jmp(L_0x406019);
> 
> For cases which are multiple of 4 bytes we can use VMASKMOVPS (conditional moves) and VPTEST.

Not sure what you mean here.  Could you elaborate (although it may be moot after all the changes)?

> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1526:
> 
>> 1524:     __ movq(rdx, r8);
>> 1525:     __ movq(rcx, r9);
>> 1526: #endif
> 
> Can we spill them into XXMs, to save costly stack operations.

Changed.

> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545:
> 
>> 1543:     //   return 0;
>> 1544:     // }
>> 1545:     __ movq(r12, rcx);
> 
> Check for K == 0 should use rsi.

k is needle length, which is rcx.

> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545:
> 
>> 1543:     //   return 0;
>> 1544:     // }
>> 1545:     __ movq(r12, rcx);
> 
> Kindly use meaningful variable and label names. It will ease the review process and maintenance.

Done.

> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1551:
> 
>> 1549:     __ movq(r15, rsi);
>> 1550:     __ movq(r11, rdi);
>> 1551:     __ cmpq(rsi, 0x20);
> 
> Comparisons with 32 bit integer length can use cmpl instead of cmpq, this may save emitting REX encoding prefix if index is allocated a GPR from lower register bank (no need for setting REX.W).

I fixed as many as I could find.  Thanks.

> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1552:
> 
>> 1550:     __ movq(r11, rdi);
>> 1551:     __ cmpq(rsi, 0x20);
>> 1552:     __ jb(L_small_string);
> 
> All the comparisons against needle length are signed integer comparisons, so jb should be replaced by jl

I'm treating everything as unsigned except where intentional negative values are used.  It never makes sense for needle length to be negative.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610118449
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610110754
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610105405
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610111320
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610113343
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610116033


More information about the hotspot-compiler-dev mailing list