RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

Thu Sep 17 18:22:35 UTC 2020

On Fri, 11 Sep 2020 23:04:01 GMT, Jason Tatton <github.com+70893615+jasontatton-aws at openjdk.org> wrote:

>> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for
>> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it
>> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as
>> code for C2.  Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and
>> java/lang/StringLatin1.java  https://bugs.openjdk.java.net/browse/JDK-8173585
>> 
>> Details of testing:
>> ============
>> I have created a jtreg test “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new intrinsic. Note
>> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the
>> input String. Hence the test has been designed to cover all these cases. In summary they are:
>> - A “short” string of < 16 characters.
>> - A SIMD String of 16 – 31 characters.
>> - A AVX2 SIMD String of 32 characters+.
>> 
>> Hardware used for testing:
>> -----------------------------
>> 
>> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) • Intel i7 processor (with AVX2 support).
>> - AWS Graviton 2 (ARM 64 processor).
>> 
>> I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
>> 
>> Possible future enhancements:
>> ====================
>> For the x86 implementation there may be two further improvements we can make in order to improve performance of both
>> the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
>> 1. Make use of AVX-512 instructions.
>> 2. For “short” Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD
>> instructions instead of a loop.
>> Benchmark results:
>> ============
>> **Without** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | ------------- | ------------- |------------- |------------- |------------- |------------- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ± 182.581 | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ± 435.933  | ns/op |
>> 
>> 
>> **With** the new StringLatin1 indexOf(char) intrinsic:
>> 
>> | Benchmark  | Mode | Cnt | Score | Error | Units |
>> | ------------- | ------------- |------------- |------------- |------------- |------------- |
>> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ± 407.716  | ns/op |
>> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ± 167.306  | ns/op |
>> 
>> 
>> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16
>> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when
>> running on ARM.
>
> Hi Andrew,
> 
> The current indexOf char intrinsics for StringUTF16 and the new one here for StringLatin1 both use the AVX2 – i.e. 256
> bit instructions, these are also affected by the frequency scaling which affects the AVX-512 instructions you pointed
> out. Of course in a world where all the work taking place is AVX instructions this wouldn’t be an issue but in mixed
> mode execution this is a problem.  However, the compiler does have knowledge of the capability of the CPU upon which
> it’s optimizing code for and is able to decide whether to use AVX instructions if they are supported by the CPU AND if
> it wouldn’t be detrimental for performance. In fact, there is a flag which one can use to interact with
> this: -XX:UseAVX=version.  This of course made testing this patch an interesting experience as the AVX2 instructions
> were not enabled on the Xeon processors which I had access to at AWS, but in the end I was able to use an i7 on my
> corporate macbook to validate the code.  From: mlbridge[bot] <notifications at github.com> Sent: 11 September 2020 17:01
> To: openjdk/jdk <jdk at noreply.github.com> Cc: Tatton, Jason <jptatton at amazon.com>; Mention <mention at noreply.github.com>
> Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) (#71)
> 
> 
> Mailing list message from Andrew Haley<mailto:aph at redhat.com> on hotspot-dev<mailto:hotspot-dev at openjdk.java.net>:
> 
> On 11/09/2020 11:23, Jason Tatton wrote:
> 
> For the x86 implementation there may be two further improvements we
> can make in order to improve performance of both the StringUTF16 and
> StringLatin1 indexOf(char) intrinsics:
> 
> 1. Make use of AVX-512 instructions.
> 
> Is this really a good idea?
> 
> When the processor detects Intel AVX instructions, additional
> voltage is applied to the core. With the additional voltage applied,
> the processor could run hotter, requiring the operating frequency to
> be reduced to maintain operations within the TDP limits. The higher
> voltage is maintained for 1 millisecond after the last Intel AVX
> instruction completes, and then the voltage returns to the nominal
> TDP voltage level.
> 
> https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf
> 
> So, if StringLatin1.indexOf(char) executes enough to make a difference
> to any real-world program, the effect may well be to slow down the clock
> for all of the code that does not use AVX.
> 
> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD
> instructions instead of a loop.
> 
> --
> Andrew Haley (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub<https://github.com/openjdk/jdk/pull/71#issuecomment-691179797>, or
> unsubscribe<https://github.com/notifications/unsubscribe-auth/AQ44AL33DFEQ36TCHSTZKCLSFJCUJANCNFSM4Q74AKCQ>.

Hi everyone,

This patch is just missing a couple of reviewers... Please can someone step forward?

I think it's a fairly straightforward change.

-Jason

-------------

PR: https://git.openjdk.java.net/jdk/pull/71