Long256Vector::mul is not compiling correctly

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Tue Aug 14 13:15:58 UTC 2018


It looks very similar to the problem Adam Petcher reported some time ago 
[1]

Though the problem was observed with auto-vectorization, the culprit is 
vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.

Best regards,
Vladimir Ivanov

[1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.html

On 13/08/2018 21:49, Adam Pocock wrote:
> Below is a minimal example where the output of the multiply changes. Run 
> with the argument 10000 (which is sufficient to trigger compilation on 
> my desktop), start and end are different. With fewer iterations (or with 
> -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
> 
> With print compilation turned on I get:
> ...
> At itr 9475, drew 
> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392746735454 
> 
>      308  502       3       jdk.incubator.vector.Long256Vector::mul (12 
> bytes)   made not entrant
>      308  577       4       jdk.incubator.vector.LongVector::toArray (18 
> bytes)
> At itr 9476, drew 
> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392746735454 
> 
> At itr 9477, drew 
> 2885783399502042673,904462311702380608,2886375887584873103,910387193500211038 
> 
> ...
> 
> Thanks,
> 
> Adam
> 
> import jdk.incubator.vector.LongVector;
> import jdk.incubator.vector.LongVector.LongSpecies;
> import jdk.incubator.vector.Shapes;
> import jdk.incubator.vector.Shapes.S256Bit;
> import jdk.incubator.vector.Vector;
> 
> import java.util.Arrays;
> 
> public class LongVectorMulTest {
>      private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
> 
>      public static void main(String[] args) {
>          int numRepeats = Integer.parseInt(args[0]);
> 
>          long[] input = new long[]{12345,123456,1234567,12345678};
> 
>          LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>) 
> Vector.species(long.class,Shapes.S_256_BIT);
>          long[] start = 
> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
>          long[] end = null;
>          for (int i = 0; i < numRepeats; i++) {
>              LongVector<S256Bit> output = 
> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
>              end = output.toArray();
>              print(end, i);
>          }
>          System.out.println("Start = " + Arrays.toString(start));
>          System.out.println("End = " + Arrays.toString(end));
>      }
> 
>      private static void print(long[] values, int iter) {
>          StringBuilder builder = new StringBuilder();
>          for (int j = 0; j < values.length; j++) {
>              builder.append(values[j]);
>              builder.append(',');
>          }
>          builder.deleteCharAt(builder.length()-1);
>          System.out.println("At itr " + iter + ", drew " + 
> builder.toString());
>      }
> }
> 
> On 13/08/18 11:37, Adam Pocock wrote:
>> Long256Vector::mul produces different output once C2 has compiled it. 
>> I've been running down some non-determinism in my vector version of 
>> SplittableRandom, and the behaviour changes when C2 compiles it. I 
>> noticed this as the stream of random numbers changes when C2 compiles 
>> Long256Vector::mul(long). Only the latter half of the vector is 
>> affected, the first two lanes still produce the same output. I'm 
>> pretty sure it's a C2 issue as running with -Xint or 
>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>
>> I've attached two runs with -XX:+PrintCompilation turned on. At line 
>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in 
>> the second file) Long256Vector::mul is compiled, and after that point 
>> the second half of the vector output changes between runs. At line 
>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled 
>> (equivalent point is line 7463 in the first file), and after that all 
>> the draws are the same. This indicates that the seeds for the RNG area 
>> always updated correctly as otherwise the runs would diverge after 
>> compilation.
>>
>> This suggests to me that the latter two lanes of C2 compiled 
>> Long256Vector::mul(long) aren't correct, as comparing an interpreted 
>> or C1 run against a run with C2 the latter two lanes diverge 
>> permanently after C2 compilation.
>>
>> The JVM reports:
>>     openjdk version "12-internal" 2019-03-19
>>     OpenJDK Runtime Environment (build 
>> 12-internal+0-adhoc.apocock.panama)
>>     OpenJDK 64-Bit Server VM (build 
>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>
>> and hg log shows the top commit for vectorIntrinsics as:
>>     changeset:   51784:4408db20792c
>>     branch:      vectorIntrinsics
>>     tag:         tip
>>     parent:      51782:1700bec0e3b5
>>     user:        rkandu
>>     date:        Wed Aug 01 06:43:33 2018 -0700
>>     summary:     SVML Linux .s files and Windows build fix
>>
>> The only lines which use mul on a Long256Vector are in the vectorised 
>> mix64:
>>     private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>     private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>>     private LongVector<S> mix64(LongVector<S> input) {
>>         input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>>         input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>>         return input.xor(input.shiftR(31));
>>     }
>>
>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2, 
>> though I haven't verified it's emitting those instructions.
>>
>> PS: @Razvan, you were right, my BinarySearch was bugged, I was using 
>> lessThan rather than lessThanEq to update the loop condition mask. 
>> Once I fixed that it produces identical output to the sequential 
>> version I was using.
>>
>> Thanks,
>>
>> Adam
>>
> 


More information about the panama-dev mailing list