IntVector.fromValues is not optimized away ?

forax at univ-mlv.fr forax at univ-mlv.fr
Mon May 11 19:59:51 UTC 2020


----- Mail original -----
> De: "Paul Sandoz" <paul.sandoz at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "panama-dev at openjdk.java.net'" <panama-dev at openjdk.java.net>
> Envoyé: Lundi 11 Mai 2020 21:42:16
> Objet: Re: IntVector.fromValues is not optimized away ?

> Swings and roundabouts.
> 
> Unsurprisingly, a significant proportion of instructions involve shuffling field
> values into temporary buffers from which vector loads are performed.
> 
> The current code and my patch result in a similar set of instructions but my
> patch is not as efficient because of a less optimal use of a vector
> instruction:
> 
> vpxor  %xmm0,%xmm1,%xmm0
> 
> vs.
> 
> vpxor  0x10(%r10),%xmm0,%xmm0
> 
> 
> HS could be smarter about gathering field values and eliding the intermediate
> var arg arrays for common layouts e.g. leverage the gather functionality.  But,
> in general, the vector load instructions prefer values linearly laid out in
> memory.
> 
> My recommendation would be to use fromValues for constant or pre-computed vector
> values.

and what i should use for fields ?

Adding pattern matching rules for that in HS can not be done ?

vmovq + vpxor => vpxor
vpinsrd + vpxor => vpxor,
etc

> 
> Paul.
> 

Rémi

> 
>> On May 11, 2020, at 11:02 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>> Hi Remi,
>> 
>> For some reason this method does not defer to the fromArray equivalent.
>> 
>> Can you try with the following patch?
>> 
>>  http://cr.openjdk.java.net/~psandoz/panama/vector-from-values-using-from-array/webrev/
>>  <http://cr.openjdk.java.net/~psandoz/panama/vector-from-values-using-from-array/webrev/>
>> 
>> I shall also investigate further.
>> 
>> Paul.
>> 
>>> On May 9, 2020, at 11:52 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>>> 
>>> Hi all,
>>> this may be obvious but do we agree that IntVector.fromValues is not optimized
>>> thus really create an array destroying any hope of perf ?
>>> 
>>> I'm trying to see the difference between
>>> 
>>>   public int hashCode() {
>>>     return i1 ^ i2 ^ i3 ^ i4;
>>>   }
>>> 
>>> and
>>> 
>>>   public int hashCode() {
>>>     var v1 = IntVector.fromValues(IntVector.SPECIES_64, i1, i3);
>>>     var v2 = IntVector.fromValues(IntVector.SPECIES_64, i2, i4);
>>>     var result = v1.lanewise(VectorOperators.XOR, v2);
>>>     return result.lane(0) ^ result.lane(1);
>>>   }
>>> 
>>> but taking a look to the generated assembly (below), the allocation of the two
>>> arrays are still there,
>>> too bad because the last 6 instructions are more or less what i was expecting.
>>> 
>>> 
>>> 0x00007fbb383324dc:   mov    0x14(%rsi),%r11d             ;*getfield i3
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 16 (line 14)
>>> 0x00007fbb383324e0:   mov    0xc(%rsi),%ebp
>>> 0x00007fbb383324e3:   mov    0x120(%r15),%r8
>>> 0x00007fbb383324ea:   mov    %r8,%r10
>>> 0x00007fbb383324ed:   add    $0x18,%r10
>>> 0x00007fbb383324f1:   cmp    0x130(%r15),%r10
>>> 0x00007fbb383324f8:   jae    0x00007fbb383325db
>>> 0x00007fbb383324fe:   mov    %r10,0x120(%r15)
>>> 0x00007fbb38332505:   prefetchw 0xc0(%r10)
>>> 0x00007fbb3833250d:   movq   $0x1,(%r8)
>>> 0x00007fbb38332514:   prefetchw 0x100(%r10)
>>> 0x00007fbb3833251c:   movl   $0x70cb1,0x8(%r8)            ;   {metadata({type
>>> array int})}
>>> 0x00007fbb38332524:   prefetchw 0x140(%r10)
>>> 0x00007fbb3833252c:   movl   $0x2,0xc(%r8)
>>> 0x00007fbb38332534:   prefetchw 0x180(%r10)
>>> 0x00007fbb3833253c:   mov    %ebp,0x10(%r8)
>>> 0x00007fbb38332540:   mov    %r11d,0x14(%r8)              ;*newarray
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - java.util.Arrays::copyOf at 1 (line 3584)
>>>                                                           ; - jdk.incubator.vector.IntVector::fromValues at 19 (line 553)
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 20 (line 14)
>>> 0x00007fbb38332544:   mov    0x18(%rsi),%r9d
>>> 0x00007fbb38332548:   mov    0x120(%r15),%rax             ;*invokestatic extract
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - jdk.incubator.vector.Int64Vector::laneHelper at 16 (line 482)
>>>                                                           ; - jdk.incubator.vector.Int64Vector::lane at 36 (line 476)
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 64 (line 17)
>>> 0x00007fbb3833254f:   mov    0x10(%rsi),%ebp              ;*getfield i2
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 33 (line 15)
>>> 0x00007fbb38332552:   mov    %rax,%r10
>>> 0x00007fbb38332555:   add    $0x18,%r10
>>> 0x00007fbb38332559:   nopl   0x0(%rax)
>>> 0x00007fbb38332560:   cmp    0x130(%r15),%r10
>>> 0x00007fbb38332567:   jae    0x00007fbb3833260d
>>> 0x00007fbb3833256d:   mov    %r10,0x120(%r15)
>>> 0x00007fbb38332574:   prefetchw 0xc0(%r10)
>>> 0x00007fbb3833257c:   movq   $0x1,(%rax)
>>> 0x00007fbb38332583:   prefetchw 0x100(%r10)
>>> 0x00007fbb3833258b:   movl   $0x70cb1,0x8(%rax)           ;   {metadata({type
>>> array int})}
>>> 0x00007fbb38332592:   prefetchw 0x140(%r10)
>>> 0x00007fbb3833259a:   movl   $0x2,0xc(%rax)
>>> 0x00007fbb383325a1:   prefetchw 0x180(%r10)
>>> 0x00007fbb383325a9:   mov    %ebp,0x10(%rax)
>>> 0x00007fbb383325ac:   mov    %r9d,0x14(%rax)              ;*newarray
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - java.util.Arrays::copyOf at 1 (line 3584)
>>>                                                           ; - jdk.incubator.vector.IntVector::fromValues at 19 (line 553)
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 44 (line 15)
>>> 0x00007fbb383325b0:   vmovq  0x10(%rax),%xmm0             ;*invokestatic extract
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - jdk.incubator.vector.Int64Vector::laneHelper at 16 (line 482)
>>>                                                           ; - jdk.incubator.vector.Int64Vector::lane at 36 (line 476)
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 64 (line 17)
>>> 0x00007fbb383325b5:   vpxor  0x10(%r8),%xmm0,%xmm0        ;*invokespecial <init>
>>> {reexecute=0 rethrow=0 return_oop=0}
>>>                                                           ; - jdk.internal.vm.vector.VectorSupport$Vector::<init>@2 (line 104)
>>>                                                           ; - jdk.incubator.vector.Vector::<init>@2 (line 1122)
>>>                                                           ; - jdk.incubator.vector.AbstractVector::<init>@2 (line 67)
>>>                                                           ; - jdk.incubator.vector.IntVector::<init>@2 (line 55)
>>>                                                           ; - jdk.incubator.vector.Int64Vector::<init>@2 (line 58)
>>>                                                           ; - jdk.incubator.vector.Int64Vector::vectorFactory at 5 (line 169)
>>>                                                           ; - jdk.incubator.vector.Int64Vector::vectorFactory at 2 (line 41)
>>>                                                           ; - jdk.incubator.vector.IntVector$IntSpecies::vectorFactory at 5 (line 3718)
>>>                                                           ; - jdk.incubator.vector.IntVector::fromValues at 22 (line 553)
>>>                                                           ; - fr.umlv.vector.VectorizedHashCode$Data::hashCode2 at 44 (line 15)
>>> 0x00007fbb383325bb:   vpextrd $0x1,%xmm0,%r11d
>>> 0x00007fbb383325c1:   vmovd  %xmm0,%eax
>>> 0x00007fbb383325c5:   xor    %r11d,%eax
>>> 0x00007fbb383325c8:   vzeroupper
>>> 
>>> regards,
>>> Rémi


More information about the panama-dev mailing list