RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions

Paul Sandoz psandoz at openjdk.java.net
Fri May 14 15:29:38 UTC 2021


On Fri, 14 May 2021 11:26:29 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values.
>> 
>> For example:
>> 
>>     public static int mismatch(byte[] a,
>>                                byte[] b,
>>                                int length) {
>>         // ISSUE: defer to index receiving methods if performance is good
>>         // assert length <= a.length
>>         // assert length <= b.length
>> 
>>         int i = 0;
>>         if (length > BYTE_THRESHOLD) {
>>             if (a[0] != b[0])
>>                 return 0;
>>             i = vectorizedMismatch(
>>                     a, Unsafe.ARRAY_BYTE_BASE_OFFSET,
>>                     b, Unsafe.ARRAY_BYTE_BASE_OFFSET,
>>                     length, LOG2_ARRAY_BYTE_INDEX_SCALE);
>>             if (i >= 0)
>>                 return i;
>>             // Align to tail
>>             i = length - ~i;
>> //            assert i >= 0 && i <= 7;
>>         }
>>         // Tail < 8 bytes
>>         for (; i < length; i++) {
>>             if (a[i] != b[i])
>>                 return i;
>>         }
>>         return -1;
>>     }
>> 
>> 
>> Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases.
>> 
>> That does leave the `i >= 0` check of the result from  `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check? 
>> 
>> A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results.
>
> Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch.

@jatin-bhateja that's good. Did performance numbers change after reverting the Java changes? 

Do you think it is worth experimenting by setting the threshold to zero when partial inlining is supported? Maybe partial inlining will help for, say, mismatching on arrays with a length of 7 or less bytes e.g. we could test quickly with mismatching for `byte`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3999


More information about the hotspot-compiler-dev mailing list