Long256Vector::mul is not compiling correctly
Rukmannagari, Shravya
shravya.rukmannagari at intel.com
Tue Aug 14 17:17:45 UTC 2018
Hi Vladimir and Adam,
Thanks a lot for reporting the error. I'm taking a look at it and will update you on the status.
Thanks,
Shravya.
-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
Sent: Tuesday, August 14, 2018 6:16 AM
To: Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
Cc: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>
Subject: Re: Long256Vector::mul is not compiling correctly
It looks very similar to the problem Adam Petcher reported some time ago [1]
Though the problem was observed with auto-vectorization, the culprit is vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.
Best regards,
Vladimir Ivanov
[1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.html
On 13/08/2018 21:49, Adam Pocock wrote:
> Below is a minimal example where the output of the multiply changes.
> Run with the argument 10000 (which is sufficient to trigger
> compilation on my desktop), start and end are different. With fewer
> iterations (or with -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
>
> With print compilation turned on I get:
> ...
> At itr 9475, drew
> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
> 746735454
>
> 308 502 3 jdk.incubator.vector.Long256Vector::mul
> (12
> bytes) made not entrant
> 308 577 4 jdk.incubator.vector.LongVector::toArray
> (18
> bytes)
> At itr 9476, drew
> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
> 746735454
>
> At itr 9477, drew
> 2885783399502042673,904462311702380608,2886375887584873103,91038719350
> 0211038
>
> ...
>
> Thanks,
>
> Adam
>
> import jdk.incubator.vector.LongVector; import
> jdk.incubator.vector.LongVector.LongSpecies;
> import jdk.incubator.vector.Shapes;
> import jdk.incubator.vector.Shapes.S256Bit;
> import jdk.incubator.vector.Vector;
>
> import java.util.Arrays;
>
> public class LongVectorMulTest {
> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>
> public static void main(String[] args) {
> int numRepeats = Integer.parseInt(args[0]);
>
> long[] input = new long[]{12345,123456,1234567,12345678};
>
> LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>)
> Vector.species(long.class,Shapes.S_256_BIT);
> long[] start =
> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
> long[] end = null;
> for (int i = 0; i < numRepeats; i++) {
> LongVector<S256Bit> output =
> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
> end = output.toArray();
> print(end, i);
> }
> System.out.println("Start = " + Arrays.toString(start));
> System.out.println("End = " + Arrays.toString(end));
> }
>
> private static void print(long[] values, int iter) {
> StringBuilder builder = new StringBuilder();
> for (int j = 0; j < values.length; j++) {
> builder.append(values[j]);
> builder.append(',');
> }
> builder.deleteCharAt(builder.length()-1);
> System.out.println("At itr " + iter + ", drew " +
> builder.toString());
> }
> }
>
> On 13/08/18 11:37, Adam Pocock wrote:
>> Long256Vector::mul produces different output once C2 has compiled it.
>> I've been running down some non-determinism in my vector version of
>> SplittableRandom, and the behaviour changes when C2 compiles it. I
>> noticed this as the stream of random numbers changes when C2 compiles
>> Long256Vector::mul(long). Only the latter half of the vector is
>> affected, the first two lanes still produce the same output. I'm
>> pretty sure it's a C2 issue as running with -Xint or
>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>
>> I've attached two runs with -XX:+PrintCompilation turned on. At line
>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in
>> the second file) Long256Vector::mul is compiled, and after that point
>> the second half of the vector output changes between runs. At line
>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled
>> (equivalent point is line 7463 in the first file), and after that all
>> the draws are the same. This indicates that the seeds for the RNG
>> area always updated correctly as otherwise the runs would diverge
>> after compilation.
>>
>> This suggests to me that the latter two lanes of C2 compiled
>> Long256Vector::mul(long) aren't correct, as comparing an interpreted
>> or C1 run against a run with C2 the latter two lanes diverge
>> permanently after C2 compilation.
>>
>> The JVM reports:
>> openjdk version "12-internal" 2019-03-19
>> OpenJDK Runtime Environment (build
>> 12-internal+0-adhoc.apocock.panama)
>> OpenJDK 64-Bit Server VM (build
>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>
>> and hg log shows the top commit for vectorIntrinsics as:
>> changeset: 51784:4408db20792c
>> branch: vectorIntrinsics
>> tag: tip
>> parent: 51782:1700bec0e3b5
>> user: rkandu
>> date: Wed Aug 01 06:43:33 2018 -0700
>> summary: SVML Linux .s files and Windows build fix
>>
>> The only lines which use mul on a Long256Vector are in the vectorised
>> mix64:
>> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>> private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>> private LongVector<S> mix64(LongVector<S> input) {
>> input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>> input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>> return input.xor(input.shiftR(31));
>> }
>>
>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2,
>> though I haven't verified it's emitting those instructions.
>>
>> PS: @Razvan, you were right, my BinarySearch was bugged, I was using
>> lessThan rather than lessThanEq to update the loop condition mask.
>> Once I fixed that it produces identical output to the sequential
>> version I was using.
>>
>> Thanks,
>>
>> Adam
>>
>
More information about the panama-dev
mailing list