RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

Tue Sep 22 02:34:04 UTC 2020

On Mon, 21 Sep 2020 09:20:56 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>>> Can you explain where this restricted effect is documented?
>> 
>> Certainly! I’ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip
>> supports it is mostly controlled in: [vm_version_x86.cpp](
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp) specifically:
>> [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755)
>> and in [generate_get_cpu_info](
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611).  In order to test the
>> patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the aforementioned code permitted
>> AVX2 instructions for (maybe this is an error and it should not be enabled for this processor though) as most of the
>> infrastructure I personally use here at AWS runs on Intel Xeon processors - I also tested on a E5-2680 which the JVM
>> does not enable AVX2 for.  However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen
>> 2 architecture, of which the current flagship: Threadripper 3990X, is based, is able to support AVX2 [without the
>> frequency scaling](
>> https://www.anandtech.com/Show/Index/14525?cPage=7&all=False&sort=0&page=9&slug=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome)
>> which some/all(?) of the Intel chips incur. I personally don’t have access to one of these chips so I cannot confirm
>> how it is classified in the JVM.  Also, I found when investigating this that there is actually a JVM flag which can be
>> used to control what level of AVX is enabled: `-XX:UseAVX=version.`
>
>>  >    Can you explain where this restricted effect is documented?
> 
>> Certainly! I’ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip
>> supports it is mostly controlled in: vm_version_x86.cpp specifically: get_processor_features and in
>> generate_get_cpu_info.
> 
> Yes, I can see what the code does. I was asking where the cpu behaviour is documented independent of the code.
> 
>> In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the
>> aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this
>> processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also
>> tested on a E5-2680 which the JVM does not enable AVX2 for.
> 
> 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe is a Skylake design. However, the code
> I pointed you at in vm_version_x86 which claims to detect 'early Skylake' designs is only disabling AVX512 support. It
> still enables AVX2. Similarly, the code that generates machine code to check the processor capabilities has a special
> check if use_evex is set (i.e. AVX3 is requested)  which disables AVX512 for Skylake but does not disable AVX2 support.
>> However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which
>> the current flagship: Threadripper 3990X, is based, is able to support AVX2 without the frequency scaling which
>> some/all(?) of the Intel chips incur. I personally don’t have access to one of these chips so I cannot confirm how it
>> is classified in the JVM.
> 
> Well, it would be good to know where you read that and to see if that confirms thar the code is avoiding the issue
> Andrew raised.
>> Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX
>> is enabled: -XX:UseAVX=version.
> 
> Yes, indeed. However, what I am trying to understand is whether the current code is bypassing the problem Andrew
> brought up in the cases where that problem actually exists. It doesn't look like it so far given that the problem
> applies to AVX2 and only AVX512 support is being disabled and, even then only for some (Skylake) processors. Without
> some clear documentation of what processors suffer from this power surge problem it will not be possible to decide
> whether this patch is doing the right thing or not.

Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed by @theRealAph  is sensitive to vector
size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896

By keeping vector size less or equal to 32 bytes we should avoid it. And as I can see this intrinsic code is using 32
bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, Assembler::AVX_256bit);`

Also we never had issues with AVX2. only with AVX512 regarding performance hit:
https://bugs.openjdk.java.net/browse/JDK-8221092

I would like to see performance numbers for for all values of UseAVX flag : 0, 1, 2, 3

The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in .ad file. Make sure to test with UseAVX=0 to
make sure that some avx instructions are not mixed into non avx code. And also with UseSSE=2 (for example) to make sure
shared code correctly recognize that intrinsics is not supported.

-------------

PR: https://git.openjdk.java.net/jdk/pull/71