Long256Vector::mul Fix for VectorAPI

Rukmannagari, Shravya shravya.rukmannagari at intel.com
Mon Aug 20 17:54:32 UTC 2018


Hi Vladimir,
There are tests currently to handle this case, but when this intrinsic was added we were using constant input values to test the functionality. But the tests we have today cover all inputs including corner cases and we are testing it for all platforms. 

Thanks,
Shravya.

-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] 
Sent: Friday, August 17, 2018 10:44 PM
To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
Subject: Re: Long256Vector::mul Fix for VectorAPI

Related question: as I understand, the bug wasn't caught by the tests.
Does it worth to add a test case then?

As I understand, the bug corrupted upper half of the result and having large enough arguments should be enough to reliably trigger it.

Best regards,
Vladimir Ivanov

On 18/08/2018 00:31, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> I updated the patch as suggested and tested it.
> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256
> Mul/
> 
> Thanks,
> Shravya
> 
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Friday, August 17, 2018 7:27 AM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; 
> panama-dev at openjdk.java.net
> Subject: Re: Long256Vector::mul Fix for VectorAPI
> 
> Looks good.
> 
> Can you achieve the same result by piggybacking on the shuffling performed by vphaddd?
> 
> -    __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister, $tmp$$XMMRegister,
> vector_len);
> +    __ vextracti128_high($tmp1$$XMMRegister, $tmp$$XMMRegister);
> +    __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister,
> $tmp1$$XMMRegister, vector_len);
> 
> Best regards,
> Vladimir Ivanov
> 
> On 16/08/2018 03:56, Rukmannagari, Shravya wrote:
>> Hi All,
>> Please review the patch which fixes the issue for Long256 mul.
>> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long25
>> 6Mul/
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On 
>> Behalf Of Rukmannagari, Shravya
>> Sent: Tuesday, August 14, 2018 10:18 AM
>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Adam Pocock 
>> <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
>> Subject: RE: Long256Vector::mul is not compiling correctly
>>
>> Hi Vladimir and Adam,
>> Thanks a lot for reporting the error. I'm taking a look at it and will update you on the status.
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>> Sent: Tuesday, August 14, 2018 6:16 AM
>> To: Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
>> Cc: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>
>> Subject: Re: Long256Vector::mul is not compiling correctly
>>
>> It looks very similar to the problem Adam Petcher reported some time 
>> ago [1]
>>
>> Though the problem was observed with auto-vectorization, the culprit is vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] 
>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.ht
>> ml
>>
>> On 13/08/2018 21:49, Adam Pocock wrote:
>>> Below is a minimal example where the output of the multiply changes.
>>> Run with the argument 10000 (which is sufficient to trigger 
>>> compilation on my desktop), start and end are different. With fewer 
>>> iterations (or with -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
>>>
>>> With print compilation turned on I get:
>>> ...
>>> At itr 9475, drew
>>> 2885783399502042673,904462311702380608,-5120899773137696113,37535603
>>> 92
>>> 746735454
>>>
>>>        308  502       3       
>>> jdk.incubator.vector.Long256Vector::mul
>>> (12
>>> bytes)   made not entrant
>>>        308  577       4       
>>> jdk.incubator.vector.LongVector::toArray
>>> (18
>>> bytes)
>>> At itr 9476, drew
>>> 2885783399502042673,904462311702380608,-5120899773137696113,37535603
>>> 92
>>> 746735454
>>>
>>> At itr 9477, drew
>>> 2885783399502042673,904462311702380608,2886375887584873103,910387193
>>> 50
>>> 0211038
>>>
>>> ...
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>> import jdk.incubator.vector.LongVector; import 
>>> jdk.incubator.vector.LongVector.LongSpecies;
>>> import jdk.incubator.vector.Shapes;
>>> import jdk.incubator.vector.Shapes.S256Bit;
>>> import jdk.incubator.vector.Vector;
>>>
>>> import java.util.Arrays;
>>>
>>> public class LongVectorMulTest {
>>>        private static final long FIRST_CONSTANT = 
>>> 0xbf58476d1ce4e5b9L;
>>>
>>>        public static void main(String[] args) {
>>>            int numRepeats = Integer.parseInt(args[0]);
>>>
>>>            long[] input = new long[]{12345,123456,1234567,12345678};
>>>
>>>            LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>) 
>>> Vector.species(long.class,Shapes.S_256_BIT);
>>>            long[] start =
>>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
>>>            long[] end = null;
>>>            for (int i = 0; i < numRepeats; i++) {
>>>                LongVector<S256Bit> output = 
>>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
>>>                end = output.toArray();
>>>                print(end, i);
>>>            }
>>>            System.out.println("Start = " + Arrays.toString(start));
>>>            System.out.println("End = " + Arrays.toString(end));
>>>        }
>>>
>>>        private static void print(long[] values, int iter) {
>>>            StringBuilder builder = new StringBuilder();
>>>            for (int j = 0; j < values.length; j++) {
>>>                builder.append(values[j]);
>>>                builder.append(',');
>>>            }
>>>            builder.deleteCharAt(builder.length()-1);
>>>            System.out.println("At itr " + iter + ", drew " + 
>>> builder.toString());
>>>        }
>>> }
>>>
>>> On 13/08/18 11:37, Adam Pocock wrote:
>>>> Long256Vector::mul produces different output once C2 has compiled it.
>>>> I've been running down some non-determinism in my vector version of 
>>>> SplittableRandom, and the behaviour changes when C2 compiles it. I 
>>>> noticed this as the stream of random numbers changes when C2 
>>>> compiles Long256Vector::mul(long). Only the latter half of the 
>>>> vector is affected, the first two lanes still produce the same 
>>>> output. I'm pretty sure it's a C2 issue as running with -Xint or
>>>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>>>
>>>> I've attached two runs with -XX:+PrintCompilation turned on. At 
>>>> line
>>>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in 
>>>> the second file) Long256Vector::mul is compiled, and after that 
>>>> point the second half of the vector output changes between runs. At 
>>>> line
>>>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled 
>>>> (equivalent point is line 7463 in the first file), and after that 
>>>> all the draws are the same. This indicates that the seeds for the 
>>>> RNG area always updated correctly as otherwise the runs would 
>>>> diverge after compilation.
>>>>
>>>> This suggests to me that the latter two lanes of C2 compiled
>>>> Long256Vector::mul(long) aren't correct, as comparing an 
>>>> interpreted or C1 run against a run with C2 the latter two lanes 
>>>> diverge permanently after C2 compilation.
>>>>
>>>> The JVM reports:
>>>>       openjdk version "12-internal" 2019-03-19
>>>>       OpenJDK Runtime Environment (build
>>>> 12-internal+0-adhoc.apocock.panama)
>>>>       OpenJDK 64-Bit Server VM (build
>>>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>>>
>>>> and hg log shows the top commit for vectorIntrinsics as:
>>>>       changeset:   51784:4408db20792c
>>>>       branch:      vectorIntrinsics
>>>>       tag:         tip
>>>>       parent:      51782:1700bec0e3b5
>>>>       user:        rkandu
>>>>       date:        Wed Aug 01 06:43:33 2018 -0700
>>>>       summary:     SVML Linux .s files and Windows build fix
>>>>
>>>> The only lines which use mul on a Long256Vector are in the 
>>>> vectorised
>>>> mix64:
>>>>       private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>>>       private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>>>>       private LongVector<S> mix64(LongVector<S> input) {
>>>>           input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>>>>           input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>>>>           return input.xor(input.shiftR(31));
>>>>       }
>>>>
>>>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2, 
>>>> though I haven't verified it's emitting those instructions.
>>>>
>>>> PS: @Razvan, you were right, my BinarySearch was bugged, I was 
>>>> using lessThan rather than lessThanEq to update the loop condition mask.
>>>> Once I fixed that it produces identical output to the sequential 
>>>> version I was using.
>>>>
>>>> Thanks,
>>>>
>>>> Adam
>>>>
>>>


More information about the panama-dev mailing list