UTF-8 Validation with the Vector API (Performance)
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Mar 12 09:44:30 UTC 2021
>> but I also wonder if the vectorIntrinsics branch (which I built) is
>> missing Vladimir's branch-prediction patch. It's a little confusing to
>> me how two different git repos are being used for the same project.
>>
>> In jdk/jdk:
>> https://github.com/openjdk/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33
>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6BzkRasjQ$>
>
> That is from Vladimir’s branch:
>
> https://github.com/iwanowww/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33
> <https://github.com/iwanowww/jdk/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33>
> https://github.com/openjdk/jdk/compare/master...iwanowww:vector.phi
> <https://github.com/openjdk/jdk/compare/master...iwanowww:vector.phi>
>
> AFAICT has not been committed to jdk/master, nor to
> panama-vector/vectorIntrinsics. Vladimir what’s the status of this,
> still too experimental?
It just went under my radar during Christmas/NY break. I planned to push
it into panama-vector after getting feedback, but forgot.
Still too early IMO to upstream it into mainline though.
Best regards,
Vladimir Ivanov
>> In panama-vector (not found):
>> https://github.com/openjdk/panama-vector/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/commit/28fcb5aebf8885c63ce97a064a1d8e4ef89b0a33__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6Bd-QIgMw$>
>>
>
> The panama-vector/vectorIntrinsics branch has additional features, API
> or otherwise, that may be experimental, or need time to bake, before we
> bring them into the main repository (some of which will be brought in
> via a JEP).
>
> We will often fix issues directly in jdk/master, which make their way
> into the panama-vector/vectorIntrinsics when we merge (most recent merge
> occurred on March 7th).
>
> Generally, you can consider jdk/master to be a subset
> of panama-vector/vectorIntrinsics. As such there may be performance
> differences between the two.
>
> Hth,
> Paul.
>
>
>> Regards,
>>
>> August
>>
>>
>> On Fri, Mar 5, 2021 at 12:59 PM Paul Sandoz <paul.sandoz at oracle.com
>> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>> Looking at the code I spot three general issues with the Vector API:
>>
>> 1. Vector.slice(int origin, Vector<E> v1) is not currently optimized.
>> We need to fix this.
>>
>> 2. Vectors held in final fields of LookupTables might not be
>> treated as constant.
>> Even though the LookupTables instance is held in a static field of
>> the benchmark, HotSpot does not by default propagate to final fields.
>> It might hoist the values outside the loop though (need to verify).
>> (There is an ongoing bug to track support for final fields being
>> really final. It’s complicated due to reflection, and
>> deserialization.)
>>
>> 3. Masked loads are not yet optimal (but since this is performed
>> at the end the impact is likely minimal).
>>
>>
>> Digging deeper and focusing on just ASCII (using 20k.txt) I think
>> there is an issue with the way C2 handles constant vectors like
>> zero (could be a regression), which causes the values to be
>> spilled on the stack which seems to cause other spills.
>>
>> So, perversely, let's create the zero vector from an array. Here’s
>> your method just focusing on ASCII:
>>
>> public static boolean validate(byte[] buf,VectorSpecies<Byte> species, LookupTables lut) {
>> // ByteVector zero = ByteVector.zero(species);
>> ByteVector zero =ByteVector.fromArray(species,new byte[species.length()],0);
>> ByteVector error =zero;
>> Vector<Byte> prevIncomplete =zero;
>>
>> int i =0;
>> for (; i < species.loopBound(buf.length); i += species.length()) {
>> ByteVector input =ByteVector.fromArray(species, buf, i);
>>
>> boolean isUTF8 =input.compare(LT,zero).anyTrue();
>> if (!isUTF8) {
>> error = error.or(prevIncomplete);
>> }
>> }
>>
>> VectorMask<Byte> m = species.indexInRange(i, buf.length);
>> ByteVector input =ByteVector.fromArray(species, buf, i,m);
>> boolean isUTF8 =input.compare(LT,zero).anyTrue();
>>
>> error = error.or(prevIncomplete);
>> return error.compare(EQ,zero).allTrue();
>> }
>>
>>
>> And run using a recent build of 17. The hot loop is:
>>
>>
>> 3.35% ↗ 0x000000011a7dcf40: cmp %r11d,%r9d
>> │ 0x000000011a7dcf43: jae 0x000000011a7dd578
>> 2.73% │ 0x000000011a7dcf49: mov 0x20(%rsp),%rcx
>> 7.02% │ 0x000000011a7dcf4e: vmovdqu 0x10(%rcx,%r9,1),%ymm2
>> 8.31% │ 0x000000011a7dcf55: vpcmpgtb %ymm2,%ymm3,%ymm2
>> 10.43% │ 0x000000011a7dcf59: vptest %ymm0,%ymm2
>> 13.04% │ 0x000000011a7dcf5e: setne %cl
>> 10.57% │ 0x000000011a7dcf61: movzbl %cl,%ecx
>> 6.82% │ 0x000000011a7dcf64: test %ecx,%ecx
>> │ 0x000000011a7dcf66: jne 0x000000011a7dd5a0
>> 6.59% │ 0x000000011a7dcf6c: mov 0x118(%r15),%rcx
>> 6.84% │ 0x000000011a7dcf73: vpor %ymm3,%ymm1,%ymm1
>> 3.51% │ 0x000000011a7dcf77: add 0x18(%rsp),%r9d
>> 2.71% │ 0x000000011a7dcf7c: test %eax,(%rcx)
>> 7.60% │ 0x000000011a7dcf7e: xchg %ax,%ax
>> 7.58% │ 0x000000011a7dcf80: cmp %r10d,%r9d
>> ╰ 0x000000011a7dcf83: jl 0x000000011a7dcf40
>>
>>
>> That ok, not great, HotSpot does not unroll, there are redundant
>> bound checks, the species length is spilled on the stack, and
>> there appears to be a safe point check.
>>
>> Something ain’t quite right. I think the loop shape is being
>> “polluted" by the processing of the array tail after the loop
>> (confirmed by removing the array tail processing).
>>
>> However, things get really bad if we swap in zero created from the
>> species, then the performance nose dives by ~7x and there are many
>> spills in the hot loop.
>>
>> We need a C2 expert to look more closely at why:
>>
>> 1. The loop shape is being affect by processing outside of the loop
>> 2. Why use of the idiomatic zero vector causes so many spills.
>>
>> Paul.
>>
>>
>>> On Mar 5, 2021, at 9:24 AM, Paul Sandoz <paul.sandoz at oracle.com
>>> <mailto:paul.sandoz at oracle.com>> wrote:
>>>
>>> Hi August,
>>>
>>> Thank you for bringing this to the list (I saw your messages on
>>> twitter and was gonna suggest you do just that but you got there
>>> before me).
>>>
>>> This is exactly the kind of thing we are looking for to exercise
>>> the API and find performance issues. I shall take a closer look.
>>>
>>> We have been methodically working through some performance issues
>>> based on other use cases, I think we will get there.
>>>
>>> Paul,
>>>
>>>> On Mar 4, 2021, at 3:49 PM, August Nagro <augustnagro at gmail.com
>>>> <mailto:augustnagro at gmail.com>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> A while back I implemented simd-json's UTF-8 validation using the
>>>> vector API. It could be considered the first step towards
>>>> implementing
>>>> simd-json completely with Java.
>>>>
>>>> The simd-json developers seem interested, which is cool. The only
>>>> problem is that it's very slow, and I don't have the knowledge
>>>> to make
>>>> it faster. Hopefully I can get away with saying it's the Vector
>>>> api's
>>>> fault and not mine. :)
>>>>
>>>> If anyone has suggestions or is interested in grocking the code
>>>> (there's not much of it), this is the github repo:
>>>> https://github.com/AugustNagro/utf8.java
>>>> <https://urldefense.com/v3/__https://github.com/AugustNagro/utf8.java__;!!GqivPVa7Brio!LaOiN2w1Jw1bNxdWTUDkUx2D50yg0QMVwRiq09OjYrUCRFuwEAZYlRN_q6DsXXOTpw$>
>>>>
>>>> Cheers,
>>>>
>>>> August
>>>
>>
>
More information about the panama-dev
mailing list