RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2]

Jason Tatton github.com+70893615+jasontatton-aws at openjdk.java.net
Fri Oct 2 08:47:47 UTC 2020


On Tue, 22 Sep 2020 15:19:37 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now
>> contains four commits:
>>  - Merge master
>>  - 8173585: further whitespace changes required by jcheck
>>  - JDK-8173585 - whitespace changes required by jcheck
>>  - JDK-8173585
>
> Hi Jason,
> 
> thanks for bringing String.indexOf() for latin strings up to date with the Unicode version.
> 
> Your changes look good except a few minor issues I've commented on right in the code.
> 
> I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text
> is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next
> three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test
> versions where the search character will be found beyond the first 32/64 characters after "fromIndex"?

@simonis Thank you for the corrections, I have ammended them in the latest comit as follows:

Changes to unit test:
- main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: `azaza`,
  `aazaazaa`, `aaazaaazaaa`, etc with `'z'` being the search character searched for. Multiple instances of the search
  character are included in the String in order to validate that the starting offset is correctly handleded. Results are
  compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various
  paths is entered into.
- Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted.
- Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked.

Other changes:
- newlines added at end of files

@vnkozlov here are the performance numbers as requested. I have included performance of the UTF16 version of the
intrinsic for reference:

| UseAVX= | UseSSE= | Benchmark                         | Mode | Cnt | Score       | Error       | Units |
|---------|---------|-----------------------------------|------|-----|-------------|-------------|-------|
|         | 0       | IndexOfBenchmark.latin1_long_char | avgt | 5   | **447,493.398** | ± 4,666.386 | ns/op |
| 0       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **104,735.941** | ± 2,484.403 | ns/op |
| 1       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **104,342.844** | ± 2,656.343 | ns/op |
| 2       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **61,000.418**  | ± 1,543.951 | ns/op |
| 3       |         | IndexOfBenchmark.latin1_long_char | avgt | 5   | **60,607.988**  | ± 1,466.354 | ns/op |
|         | 0       | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 672,475.302 | ± 4,998.596 | ns/op |
| 0       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 175,521.654 | ± 7,549.094 | ns/op |
| 1       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 172,514.981 | ± 3,561.040 | ns/op |
| 2       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 120,725.748 | ± 2,004.400 | ns/op |
| 3       |         | IndexOfBenchmark.utf16_long_char  | avgt | 5   | 120,664.623 | ± 1,988.419 | ns/op |

I think the results are as expected, we see improvements in performance as the range of SSE and AVX instructions which
can be used is expanded upon. Note that no improvement is observed with UseAVX=3 because there is no AVX-512 code in
these intrinsics.

-------------

PR: https://git.openjdk.java.net/jdk/pull/71


More information about the core-libs-dev mailing list