RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

Tue Oct 6 15:46:12 UTC 2020

On Fri, 2 Oct 2020 08:44:48 GMT, Jason Tatton <github.com+70893615+jasontatton-aws at openjdk.org> wrote:

>> Hi Jason,
>> 
>> thanks for bringing String.indexOf() for latin strings up to date with the Unicode version.
>> 
>> Your changes look good except a few minor issues I've commented on right in the code.
>> 
>> I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text
>> is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next
>> three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test
>> versions where the search character will be found beyond the first 32/64 characters after "fromIndex"?
>
> @simonis Thank you for the corrections, I have ammended them in the latest comit as follows:
> 
> Changes to unit test:
> - main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: `azaza`,
>   `aazaazaa`, `aaazaaazaaa`, etc with `'z'` being the search character searched for. Multiple instances of the search
>   character are included in the String in order to validate that the starting offset is correctly handleded. Results are
>   compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various
>   paths is entered into.
> - Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted.
> - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked.
> 
> Other changes:
> - newlines added at end of files
> 
> @vnkozlov here are the performance numbers as requested. I have included performance of the UTF16 version of the
> intrinsic for reference:
> | UseAVX= | UseSSE= | Benchmark                         | Mode | Cnt | Score       | Error       | Units |
> |---------|---------|-----------------------------------|------|-----|-------------|-------------|-------|
> |         | 0       | IndexOfBenchmark.latin1_long_char | avgt | 5   | **447,493.398** | ± 4,666.386 | ns/op |
> | 0       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **104,735.941** | ± 2,484.403 | ns/op |
> | 1       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **104,342.844** | ± 2,656.343 | ns/op |
> | 2       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **61,000.418**  | ± 1,543.951 | ns/op |
> | 3       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **60,607.988**  | ± 1,466.354 | ns/op |
> |         | 0       | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 672,475.302 | ± 4,998.596 | ns/op |
> | 0       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 175,521.654 | ± 7,549.094 | ns/op |
> | 1       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 172,514.981 | ± 3,561.040 | ns/op |
> | 2       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 120,725.748 | ± 2,004.400 | ns/op |
> | 3       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 120,664.623 | ± 1,988.419 | ns/op |
> 
> I think the results are as expected, we see improvements in performance as the range of SSE and AVX instructions which
> can be used is expanded upon. Note that no improvement is observed with UseAVX=3 because there is no AVX-512 code in
> these intrinsics.

Hi All,

Just wondering if there is anything you'd like me to do in order to assist with moving this patch forward?

Thanks,
Jason

-------------

PR: https://git.openjdk.java.net/jdk/pull/71