RFR: 8173585: Intrinsify StringLatin1.indexOf(char)

Tue Sep 22 09:01:22 UTC 2020

On Tue, 22 Sep 2020 02:31:14 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>>>  >    Can you explain where this restricted effect is documented?
>> 
>>> Certainly! I’ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip
>>> supports it is mostly controlled in: vm_version_x86.cpp specifically: get_processor_features and in
>>> generate_get_cpu_info.
>> 
>> Yes, I can see what the code does. I was asking where the cpu behaviour is documented independent of the code.
>> 
>>> In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the
>>> aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this
>>> processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also
>>> tested on a E5-2680 which the JVM does not enable AVX2 for.
>> 
>> 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe is a Skylake design. However, the code
>> I pointed you at in vm_version_x86 which claims to detect 'early Skylake' designs is only disabling AVX512 support. It
>> still enables AVX2. Similarly, the code that generates machine code to check the processor capabilities has a special
>> check if use_evex is set (i.e. AVX3 is requested)  which disables AVX512 for Skylake but does not disable AVX2 support.
>>> However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which
>>> the current flagship: Threadripper 3990X, is based, is able to support AVX2 without the frequency scaling which
>>> some/all(?) of the Intel chips incur. I personally don’t have access to one of these chips so I cannot confirm how it
>>> is classified in the JVM.
>> 
>> Well, it would be good to know where you read that and to see if that confirms thar the code is avoiding the issue
>> Andrew raised.
>>> Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX
>>> is enabled: -XX:UseAVX=version.
>> 
>> Yes, indeed. However, what I am trying to understand is whether the current code is bypassing the problem Andrew
>> brought up in the cases where that problem actually exists. It doesn't look like it so far given that the problem
>> applies to AVX2 and only AVX512 support is being disabled and, even then only for some (Skylake) processors. Without
>> some clear documentation of what processors suffer from this power surge problem it will not be possible to decide
>> whether this patch is doing the right thing or not.
>
> Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed by @theRealAph  is sensitive to vector
> size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896
> 
> By keeping vector size less or equal to 32 bytes we should avoid it. And as I can see this intrinsic code is using 32
> bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, Assembler::AVX_256bit);`
> Also we never had issues with AVX2. only with AVX512 regarding performance hit:
> https://bugs.openjdk.java.net/browse/JDK-8221092
> 
> I would like to see performance numbers for for all values of UseAVX flag : 0, 1, 2, 3
> 
> The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in .ad file. Make sure to test with UseAVX=0 to
> make sure that some avx instructions are not mixed into non avx code. And also with UseSSE=2 (for example) to make sure
> shared code correctly recognize that intrinsics is not supported.

@vnkozlov @mknjc  @jatin-bhateja Thanks for providing the relevant details. I'm now quite content that this patch
avoids any potential frequency scaling problem. I'm also glad that an explanation of why this is so is now
available --  although it's not perfect that we are relying on a stackoverflow post for the full details.

-------------

PR: https://git.openjdk.java.net/jdk/pull/71