UTF-8 Validation with the Vector API (Performance)

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Mar 12 09:44:30 UTC 2021


>> but I also wonder if the vectorIntrinsics branch (which I built) is 
>> missing Vladimir's branch-prediction patch. It's a little confusing to 
>> me how two different git repos are being used for the same project.
>>
>> In jdk/jdk: 
>> https://github.com/openjdk/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33 
>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6BzkRasjQ$>
> 
> That is from Vladimir’s branch:
> 
> https://github.com/iwanowww/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33 
> <https://github.com/iwanowww/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33>
> https://github.com/openjdk/jdk/compare/master...iwanowww:vector.phi 
> <https://github.com/openjdk/jdk/compare/master...iwanowww:vector.phi>
> 
> AFAICT has not been committed to jdk/master, nor to 
> panama-vector/vectorIntrinsics. Vladimir what’s the status of this, 
> still too experimental?

It just went under my radar during Christmas/NY break. I planned to push 
it into panama-vector after getting feedback, but forgot.

Still too early IMO to upstream it into mainline though.

Best regards,
Vladimir Ivanov

>> In panama-vector (not found): 
>> https://github.com/openjdk/panama-vector/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33 
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6Bd-QIgMw$>
>>
> 
> The panama-vector/vectorIntrinsics branch has additional features, API 
> or otherwise, that may be experimental, or need time to bake, before we 
> bring them into the main repository (some of which will be brought in 
> via a JEP).
> 
> We will often fix issues directly in jdk/master, which make their way 
> into the panama-vector/vectorIntrinsics when we merge (most recent merge 
> occurred on March 7th).
> 
> Generally, you can consider jdk/master to be a subset 
> of panama-vector/vectorIntrinsics. As such there may be performance 
> differences between the two.
> 
> Hth,
> Paul.
> 
> 
>> Regards,
>>
>> August
>>
>>
>> On Fri, Mar 5, 2021 at 12:59 PM Paul Sandoz <paul.sandoz at oracle.com 
>> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>>     Looking at the code I spot three general issues with the Vector API:
>>
>>     1. Vector.slice(int origin, Vector<E> v1) is not currently optimized.
>>     We need to fix this.
>>
>>     2. Vectors held in final fields of LookupTables might not be
>>     treated as constant.
>>     Even though the LookupTables instance is held in a static field of
>>     the benchmark, HotSpot does not by default propagate to final fields.
>>     It might hoist the values outside the loop though (need to verify).
>>     (There is an ongoing bug to track support for final fields being
>>     really final. It’s complicated due to reflection, and
>>     deserialization.)
>>
>>     3. Masked loads are not yet optimal (but since this is performed
>>     at the end the impact is likely minimal).
>>
>>
>>     Digging deeper and focusing on just ASCII (using 20k.txt) I think
>>     there is an issue with the way C2 handles constant vectors like
>>     zero (could be a regression), which causes the values to be
>>     spilled on the stack which seems to cause other spills.
>>
>>     So, perversely, let's create the zero vector from an array. Here’s
>>     your method just focusing on ASCII:
>>
>>          public static boolean validate(byte[] buf,VectorSpecies<Byte> species, LookupTables lut) {
>>     // ByteVector zero = ByteVector.zero(species);
>>     ByteVector zero =ByteVector.fromArray(species,new byte[species.length()],0);
>>              ByteVector error =zero;
>>              Vector<Byte> prevIncomplete =zero;
>>
>>              int i =0;
>>              for (; i < species.loopBound(buf.length); i += species.length()) {
>>                  ByteVector input =ByteVector.fromArray(species, buf, i);
>>
>>                  boolean isUTF8 =input.compare(LT,zero).anyTrue();
>>                  if (!isUTF8) {
>>                      error = error.or(prevIncomplete);
>>                  }
>>              }
>>
>>              VectorMask<Byte> m = species.indexInRange(i, buf.length);
>>              ByteVector input =ByteVector.fromArray(species, buf, i,m);
>>              boolean isUTF8 =input.compare(LT,zero).anyTrue();
>>
>>              error = error.or(prevIncomplete);
>>              return error.compare(EQ,zero).allTrue();
>>          }
>>
>>
>>     And run using a recent build of 17. The hot loop is:
>>
>>
>>      3.35%  ↗  0x000000011a7dcf40:   cmp    %r11d,%r9d
>>             │  0x000000011a7dcf43:   jae    0x000000011a7dd578
>>      2.73%  │  0x000000011a7dcf49:   mov    0x20(%rsp),%rcx
>>      7.02%  │  0x000000011a7dcf4e:   vmovdqu 0x10(%rcx,%r9,1),%ymm2
>>      8.31%  │  0x000000011a7dcf55:   vpcmpgtb %ymm2,%ymm3,%ymm2
>>     10.43%  │  0x000000011a7dcf59:   vptest %ymm0,%ymm2
>>     13.04%  │  0x000000011a7dcf5e:   setne  %cl
>>     10.57%  │  0x000000011a7dcf61:   movzbl %cl,%ecx
>>      6.82%  │  0x000000011a7dcf64:   test   %ecx,%ecx
>>             │  0x000000011a7dcf66:   jne    0x000000011a7dd5a0
>>      6.59%  │  0x000000011a7dcf6c:   mov    0x118(%r15),%rcx
>>      6.84%  │  0x000000011a7dcf73:   vpor   %ymm3,%ymm1,%ymm1
>>      3.51%  │  0x000000011a7dcf77:   add    0x18(%rsp),%r9d
>>      2.71%  │  0x000000011a7dcf7c:   test   %eax,(%rcx)
>>      7.60%  │  0x000000011a7dcf7e:   xchg   %ax,%ax
>>      7.58%  │  0x000000011a7dcf80:   cmp    %r10d,%r9d
>>             ╰  0x000000011a7dcf83:   jl     0x000000011a7dcf40
>>
>>
>>     That ok, not great, HotSpot does not unroll, there are redundant
>>     bound checks, the species length is spilled on the stack, and
>>     there appears to be a safe point check.
>>
>>     Something ain’t quite right. I think the loop shape is being
>>     “polluted" by the processing of the array tail after the loop
>>     (confirmed by removing the array tail processing).
>>
>>     However, things get really bad if we swap in zero created from the
>>     species, then the performance nose dives by ~7x and there are many
>>     spills in the hot loop.
>>
>>     We need a C2 expert to look more closely at why:
>>
>>     1. The loop shape is being affect by processing outside of the loop
>>     2. Why use of the idiomatic zero vector causes so many spills.
>>
>>     Paul.
>>
>>
>>>     On Mar 5, 2021, at 9:24 AM, Paul Sandoz <paul.sandoz at oracle.com
>>>     <mailto:paul.sandoz at oracle.com>> wrote:
>>>
>>>     Hi August,
>>>
>>>     Thank you for bringing this to the list (I saw your messages on
>>>     twitter and was gonna suggest you do just that but you got there
>>>     before me).
>>>
>>>     This is exactly the kind of thing we are looking for to exercise
>>>     the API and find performance issues. I shall take a closer look.
>>>
>>>     We have been methodically working through some performance issues
>>>     based on other use cases, I think we will get there.
>>>
>>>     Paul,
>>>
>>>>     On Mar 4, 2021, at 3:49 PM, August Nagro <augustnagro at gmail.com
>>>>     <mailto:augustnagro at gmail.com>> wrote:
>>>>
>>>>     Hello,
>>>>
>>>>     A while back I implemented simd-json's UTF-8 validation using the
>>>>     vector API. It could be considered the first step towards
>>>>     implementing
>>>>     simd-json completely with Java.
>>>>
>>>>     The simd-json developers seem interested, which is cool. The only
>>>>     problem is that it's very slow, and I don't have the knowledge
>>>>     to make
>>>>     it faster. Hopefully I can get away with saying it's the Vector
>>>>     api's
>>>>     fault and not mine. :)
>>>>
>>>>     If anyone has suggestions or is interested in grocking the code
>>>>     (there's not much of it), this is the github repo:
>>>>     https://github.com/AugustNagro/utf8.java
>>>>     <https://urldefense.com/v3/__https://github.com/AugustNagro/utf8.java__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6DsXXOTpw$>
>>>>
>>>>     Cheers,
>>>>
>>>>     August
>>>
>>
> 


More information about the panama-dev mailing list