Long256Vector::mul Fix for VectorAPI

Rukmannagari, Shravya shravya.rukmannagari at intel.com
Fri Aug 17 21:31:20 UTC 2018


Hi Vladimir,
I updated the patch as suggested and tested it. 
http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/

Thanks,
Shravya

-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] 
Sent: Friday, August 17, 2018 7:27 AM
To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
Subject: Re: Long256Vector::mul Fix for VectorAPI

Looks good.

Can you achieve the same result by piggybacking on the shuffling performed by vphaddd?

-    __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister, $tmp$$XMMRegister, 
vector_len);
+    __ vextracti128_high($tmp1$$XMMRegister, $tmp$$XMMRegister);
+    __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister,
$tmp1$$XMMRegister, vector_len);

Best regards,
Vladimir Ivanov

On 16/08/2018 03:56, Rukmannagari, Shravya wrote:
> Hi All,
> Please review the patch which fixes the issue for Long256 mul.
> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/
> 
> Thanks,
> Shravya.
> 
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Rukmannagari, Shravya
> Sent: Tuesday, August 14, 2018 10:18 AM
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
> Subject: RE: Long256Vector::mul is not compiling correctly
> 
> Hi Vladimir and Adam,
> Thanks a lot for reporting the error. I'm taking a look at it and will update you on the status.
> 
> Thanks,
> Shravya.
> 
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Tuesday, August 14, 2018 6:16 AM
> To: Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
> Cc: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>
> Subject: Re: Long256Vector::mul is not compiling correctly
> 
> It looks very similar to the problem Adam Petcher reported some time ago [1]
> 
> Though the problem was observed with auto-vectorization, the culprit is vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.html
> 
> On 13/08/2018 21:49, Adam Pocock wrote:
>> Below is a minimal example where the output of the multiply changes.
>> Run with the argument 10000 (which is sufficient to trigger
>> compilation on my desktop), start and end are different. With fewer
>> iterations (or with -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
>>
>> With print compilation turned on I get:
>> ...
>> At itr 9475, drew
>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>> 746735454
>>
>>       308  502       3       jdk.incubator.vector.Long256Vector::mul
>> (12
>> bytes)   made not entrant
>>       308  577       4       jdk.incubator.vector.LongVector::toArray
>> (18
>> bytes)
>> At itr 9476, drew
>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>> 746735454
>>
>> At itr 9477, drew
>> 2885783399502042673,904462311702380608,2886375887584873103,91038719350
>> 0211038
>>
>> ...
>>
>> Thanks,
>>
>> Adam
>>
>> import jdk.incubator.vector.LongVector; import
>> jdk.incubator.vector.LongVector.LongSpecies;
>> import jdk.incubator.vector.Shapes;
>> import jdk.incubator.vector.Shapes.S256Bit;
>> import jdk.incubator.vector.Vector;
>>
>> import java.util.Arrays;
>>
>> public class LongVectorMulTest {
>>       private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>
>>       public static void main(String[] args) {
>>           int numRepeats = Integer.parseInt(args[0]);
>>
>>           long[] input = new long[]{12345,123456,1234567,12345678};
>>
>>           LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>)
>> Vector.species(long.class,Shapes.S_256_BIT);
>>           long[] start =
>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
>>           long[] end = null;
>>           for (int i = 0; i < numRepeats; i++) {
>>               LongVector<S256Bit> output =
>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
>>               end = output.toArray();
>>               print(end, i);
>>           }
>>           System.out.println("Start = " + Arrays.toString(start));
>>           System.out.println("End = " + Arrays.toString(end));
>>       }
>>
>>       private static void print(long[] values, int iter) {
>>           StringBuilder builder = new StringBuilder();
>>           for (int j = 0; j < values.length; j++) {
>>               builder.append(values[j]);
>>               builder.append(',');
>>           }
>>           builder.deleteCharAt(builder.length()-1);
>>           System.out.println("At itr " + iter + ", drew " +
>> builder.toString());
>>       }
>> }
>>
>> On 13/08/18 11:37, Adam Pocock wrote:
>>> Long256Vector::mul produces different output once C2 has compiled it.
>>> I've been running down some non-determinism in my vector version of
>>> SplittableRandom, and the behaviour changes when C2 compiles it. I
>>> noticed this as the stream of random numbers changes when C2 compiles
>>> Long256Vector::mul(long). Only the latter half of the vector is
>>> affected, the first two lanes still produce the same output. I'm
>>> pretty sure it's a C2 issue as running with -Xint or
>>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>>
>>> I've attached two runs with -XX:+PrintCompilation turned on. At line
>>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in
>>> the second file) Long256Vector::mul is compiled, and after that point
>>> the second half of the vector output changes between runs. At line
>>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled
>>> (equivalent point is line 7463 in the first file), and after that all
>>> the draws are the same. This indicates that the seeds for the RNG
>>> area always updated correctly as otherwise the runs would diverge
>>> after compilation.
>>>
>>> This suggests to me that the latter two lanes of C2 compiled
>>> Long256Vector::mul(long) aren't correct, as comparing an interpreted
>>> or C1 run against a run with C2 the latter two lanes diverge
>>> permanently after C2 compilation.
>>>
>>> The JVM reports:
>>>      openjdk version "12-internal" 2019-03-19
>>>      OpenJDK Runtime Environment (build
>>> 12-internal+0-adhoc.apocock.panama)
>>>      OpenJDK 64-Bit Server VM (build
>>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>>
>>> and hg log shows the top commit for vectorIntrinsics as:
>>>      changeset:   51784:4408db20792c
>>>      branch:      vectorIntrinsics
>>>      tag:         tip
>>>      parent:      51782:1700bec0e3b5
>>>      user:        rkandu
>>>      date:        Wed Aug 01 06:43:33 2018 -0700
>>>      summary:     SVML Linux .s files and Windows build fix
>>>
>>> The only lines which use mul on a Long256Vector are in the vectorised
>>> mix64:
>>>      private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>>      private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>>>      private LongVector<S> mix64(LongVector<S> input) {
>>>          input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>>>          input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>>>          return input.xor(input.shiftR(31));
>>>      }
>>>
>>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2,
>>> though I haven't verified it's emitting those instructions.
>>>
>>> PS: @Razvan, you were right, my BinarySearch was bugged, I was using
>>> lessThan rather than lessThanEq to update the loop condition mask.
>>> Once I fixed that it produces identical output to the sequential
>>> version I was using.
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>


More information about the panama-dev mailing list