Long256Vector::mul Fix for VectorAPI
Rukmannagari, Shravya
shravya.rukmannagari at intel.com
Fri Aug 17 21:31:20 UTC 2018
Hi Vladimir,
I updated the patch as suggested and tested it.
http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/
Thanks,
Shravya
-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
Sent: Friday, August 17, 2018 7:27 AM
To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
Subject: Re: Long256Vector::mul Fix for VectorAPI
Looks good.
Can you achieve the same result by piggybacking on the shuffling performed by vphaddd?
- __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister, $tmp$$XMMRegister,
vector_len);
+ __ vextracti128_high($tmp1$$XMMRegister, $tmp$$XMMRegister);
+ __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister,
$tmp1$$XMMRegister, vector_len);
Best regards,
Vladimir Ivanov
On 16/08/2018 03:56, Rukmannagari, Shravya wrote:
> Hi All,
> Please review the patch which fixes the issue for Long256 mul.
> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/
>
> Thanks,
> Shravya.
>
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Rukmannagari, Shravya
> Sent: Tuesday, August 14, 2018 10:18 AM
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
> Subject: RE: Long256Vector::mul is not compiling correctly
>
> Hi Vladimir and Adam,
> Thanks a lot for reporting the error. I'm taking a look at it and will update you on the status.
>
> Thanks,
> Shravya.
>
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Tuesday, August 14, 2018 6:16 AM
> To: Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
> Cc: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>
> Subject: Re: Long256Vector::mul is not compiling correctly
>
> It looks very similar to the problem Adam Petcher reported some time ago [1]
>
> Though the problem was observed with auto-vectorization, the culprit is vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.
>
> Best regards,
> Vladimir Ivanov
>
> [1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.html
>
> On 13/08/2018 21:49, Adam Pocock wrote:
>> Below is a minimal example where the output of the multiply changes.
>> Run with the argument 10000 (which is sufficient to trigger
>> compilation on my desktop), start and end are different. With fewer
>> iterations (or with -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
>>
>> With print compilation turned on I get:
>> ...
>> At itr 9475, drew
>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>> 746735454
>>
>> 308 502 3 jdk.incubator.vector.Long256Vector::mul
>> (12
>> bytes) made not entrant
>> 308 577 4 jdk.incubator.vector.LongVector::toArray
>> (18
>> bytes)
>> At itr 9476, drew
>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>> 746735454
>>
>> At itr 9477, drew
>> 2885783399502042673,904462311702380608,2886375887584873103,91038719350
>> 0211038
>>
>> ...
>>
>> Thanks,
>>
>> Adam
>>
>> import jdk.incubator.vector.LongVector; import
>> jdk.incubator.vector.LongVector.LongSpecies;
>> import jdk.incubator.vector.Shapes;
>> import jdk.incubator.vector.Shapes.S256Bit;
>> import jdk.incubator.vector.Vector;
>>
>> import java.util.Arrays;
>>
>> public class LongVectorMulTest {
>> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>
>> public static void main(String[] args) {
>> int numRepeats = Integer.parseInt(args[0]);
>>
>> long[] input = new long[]{12345,123456,1234567,12345678};
>>
>> LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>)
>> Vector.species(long.class,Shapes.S_256_BIT);
>> long[] start =
>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
>> long[] end = null;
>> for (int i = 0; i < numRepeats; i++) {
>> LongVector<S256Bit> output =
>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
>> end = output.toArray();
>> print(end, i);
>> }
>> System.out.println("Start = " + Arrays.toString(start));
>> System.out.println("End = " + Arrays.toString(end));
>> }
>>
>> private static void print(long[] values, int iter) {
>> StringBuilder builder = new StringBuilder();
>> for (int j = 0; j < values.length; j++) {
>> builder.append(values[j]);
>> builder.append(',');
>> }
>> builder.deleteCharAt(builder.length()-1);
>> System.out.println("At itr " + iter + ", drew " +
>> builder.toString());
>> }
>> }
>>
>> On 13/08/18 11:37, Adam Pocock wrote:
>>> Long256Vector::mul produces different output once C2 has compiled it.
>>> I've been running down some non-determinism in my vector version of
>>> SplittableRandom, and the behaviour changes when C2 compiles it. I
>>> noticed this as the stream of random numbers changes when C2 compiles
>>> Long256Vector::mul(long). Only the latter half of the vector is
>>> affected, the first two lanes still produce the same output. I'm
>>> pretty sure it's a C2 issue as running with -Xint or
>>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>>
>>> I've attached two runs with -XX:+PrintCompilation turned on. At line
>>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in
>>> the second file) Long256Vector::mul is compiled, and after that point
>>> the second half of the vector output changes between runs. At line
>>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled
>>> (equivalent point is line 7463 in the first file), and after that all
>>> the draws are the same. This indicates that the seeds for the RNG
>>> area always updated correctly as otherwise the runs would diverge
>>> after compilation.
>>>
>>> This suggests to me that the latter two lanes of C2 compiled
>>> Long256Vector::mul(long) aren't correct, as comparing an interpreted
>>> or C1 run against a run with C2 the latter two lanes diverge
>>> permanently after C2 compilation.
>>>
>>> The JVM reports:
>>> openjdk version "12-internal" 2019-03-19
>>> OpenJDK Runtime Environment (build
>>> 12-internal+0-adhoc.apocock.panama)
>>> OpenJDK 64-Bit Server VM (build
>>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>>
>>> and hg log shows the top commit for vectorIntrinsics as:
>>> changeset: 51784:4408db20792c
>>> branch: vectorIntrinsics
>>> tag: tip
>>> parent: 51782:1700bec0e3b5
>>> user: rkandu
>>> date: Wed Aug 01 06:43:33 2018 -0700
>>> summary: SVML Linux .s files and Windows build fix
>>>
>>> The only lines which use mul on a Long256Vector are in the vectorised
>>> mix64:
>>> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>> private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>>> private LongVector<S> mix64(LongVector<S> input) {
>>> input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>>> input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>>> return input.xor(input.shiftR(31));
>>> }
>>>
>>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2,
>>> though I haven't verified it's emitting those instructions.
>>>
>>> PS: @Razvan, you were right, my BinarySearch was bugged, I was using
>>> lessThan rather than lessThanEq to update the loop condition mask.
>>> Once I fixed that it produces identical output to the sequential
>>> version I was using.
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>
More information about the panama-dev
mailing list