[vectorIntrinsics] RFR: Add utf8 decoding benchmarks

Wed Nov 25 21:28:08 UTC 2020

On Wed, 25 Nov 2020 17:37:01 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Following discussions on the mailing list, I'm submitting three benchmarks around UTF-8 decoding:
>>  - decode: uses a while-loop based implementation currently in use in the JDK
>>  - decodeVector: uses a lookup table with vector operations for 1-3 bytes characters
>>  - decodeVectorASCII: uses a simple vector operation to accelerate parsing ASCII-only characters
>> 
>> We don't observe the expected speedups with either decodeVector and decodeVectorASCII, so these are, I think, good test cases to further develop the Vector API.
>
> @luhenry Looks good to me as well. Please do create a separate branch (other than vectorIntrinsics) as indicated above by OpenJDK Skara bot and resubmit the pull request from the new branch.

This benchmark has identified a few issues:

1. Operations on species of byte/64 are not intrinsic.
2. `ShortVector` from/to char[] are not intrinsic (i suspected this to be the case).

For now:

1) can be unblocked by focusing on byte/128 and short/256.
2) can be unblocked with the following patch (so it appears, but needs more detailed review) or using `short[]`.

diff --git a/src/hotspot/share/opto/vectorIntrinsics.cpp b/src/hotspot/share/opto/vectorIntrinsics.cpp
index db7b69a9137..399130b0e45 100644
--- a/src/hotspot/share/opto/vectorIntrinsics.cpp
+++ b/src/hotspot/share/opto/vectorIntrinsics.cpp
@@ -624,7 +624,10 @@ bool LibraryCallKit::inline_vector_mem_operation(bool is_store) {
   // Handle loading masks.
   // If there is no consistency between array and vector element types, it must be special byte array case or loading masks
   if (arr_type != NULL && !using_byte_array && elem_bt != arr_type->elem()->array_element_basic_type() && !is_mask) {
-    return false;
+    if (elem_bt == T_SHORT && arr_type->elem()->array_element_basic_type() == T_CHAR) {
+    } else {
+      return false;
+    }
   }

In general, to found the causes of issues i recommend extracting out vector sub-expressions and placing in separate benchmarks. It's easier to analyze the code that is generated.

The benchmark is also storing vectors/shuffles on the heap. Instead i recommend storing such data in compatible arrays, then loading into vector instances held in local variables.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/26