Long256Vector::mul Fix for VectorAPI
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Sat Aug 18 05:44:26 UTC 2018
Related question: as I understand, the bug wasn't caught by the tests.
Does it worth to add a test case then?
As I understand, the bug corrupted upper half of the result and having
large enough arguments should be enough to reliably trigger it.
Best regards,
Vladimir Ivanov
On 18/08/2018 00:31, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> I updated the patch as suggested and tested it.
> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/
>
> Thanks,
> Shravya
>
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Friday, August 17, 2018 7:27 AM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; panama-dev at openjdk.java.net
> Subject: Re: Long256Vector::mul Fix for VectorAPI
>
> Looks good.
>
> Can you achieve the same result by piggybacking on the shuffling performed by vphaddd?
>
> - __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister, $tmp$$XMMRegister,
> vector_len);
> + __ vextracti128_high($tmp1$$XMMRegister, $tmp$$XMMRegister);
> + __ vphaddd($tmp$$XMMRegister, $tmp$$XMMRegister,
> $tmp1$$XMMRegister, vector_len);
>
> Best regards,
> Vladimir Ivanov
>
> On 16/08/2018 03:56, Rukmannagari, Shravya wrote:
>> Hi All,
>> Please review the patch which fixes the issue for Long256 mul.
>> http://cr.openjdk.java.net/~srukmannagar/VectorAPIFixes/webrev_Long256Mul/
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Rukmannagari, Shravya
>> Sent: Tuesday, August 14, 2018 10:18 AM
>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
>> Subject: RE: Long256Vector::mul is not compiling correctly
>>
>> Hi Vladimir and Adam,
>> Thanks a lot for reporting the error. I'm taking a look at it and will update you on the status.
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>> Sent: Tuesday, August 14, 2018 6:16 AM
>> To: Adam Pocock <adam.pocock at oracle.com>; panama-dev at openjdk.java.net
>> Cc: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>
>> Subject: Re: Long256Vector::mul is not compiling correctly
>>
>> It looks very similar to the problem Adam Petcher reported some time ago [1]
>>
>> Though the problem was observed with auto-vectorization, the culprit is vmul4L_reg_avx rule which is also used in Long256Vector::mul() case.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-July/002348.html
>>
>> On 13/08/2018 21:49, Adam Pocock wrote:
>>> Below is a minimal example where the output of the multiply changes.
>>> Run with the argument 10000 (which is sufficient to trigger
>>> compilation on my desktop), start and end are different. With fewer
>>> iterations (or with -Xint or -XX:TieredStopAtLevel=3) start and end are the same.
>>>
>>> With print compilation turned on I get:
>>> ...
>>> At itr 9475, drew
>>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>>> 746735454
>>>
>>> 308 502 3 jdk.incubator.vector.Long256Vector::mul
>>> (12
>>> bytes) made not entrant
>>> 308 577 4 jdk.incubator.vector.LongVector::toArray
>>> (18
>>> bytes)
>>> At itr 9476, drew
>>> 2885783399502042673,904462311702380608,-5120899773137696113,3753560392
>>> 746735454
>>>
>>> At itr 9477, drew
>>> 2885783399502042673,904462311702380608,2886375887584873103,91038719350
>>> 0211038
>>>
>>> ...
>>>
>>> Thanks,
>>>
>>> Adam
>>>
>>> import jdk.incubator.vector.LongVector; import
>>> jdk.incubator.vector.LongVector.LongSpecies;
>>> import jdk.incubator.vector.Shapes;
>>> import jdk.incubator.vector.Shapes.S256Bit;
>>> import jdk.incubator.vector.Vector;
>>>
>>> import java.util.Arrays;
>>>
>>> public class LongVectorMulTest {
>>> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>>
>>> public static void main(String[] args) {
>>> int numRepeats = Integer.parseInt(args[0]);
>>>
>>> long[] input = new long[]{12345,123456,1234567,12345678};
>>>
>>> LongSpecies<S256Bit> lSpec = (LongSpecies<S256Bit>)
>>> Vector.species(long.class,Shapes.S_256_BIT);
>>> long[] start =
>>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT).toArray();
>>> long[] end = null;
>>> for (int i = 0; i < numRepeats; i++) {
>>> LongVector<S256Bit> output =
>>> lSpec.fromArray(input,0).mul(FIRST_CONSTANT);
>>> end = output.toArray();
>>> print(end, i);
>>> }
>>> System.out.println("Start = " + Arrays.toString(start));
>>> System.out.println("End = " + Arrays.toString(end));
>>> }
>>>
>>> private static void print(long[] values, int iter) {
>>> StringBuilder builder = new StringBuilder();
>>> for (int j = 0; j < values.length; j++) {
>>> builder.append(values[j]);
>>> builder.append(',');
>>> }
>>> builder.deleteCharAt(builder.length()-1);
>>> System.out.println("At itr " + iter + ", drew " +
>>> builder.toString());
>>> }
>>> }
>>>
>>> On 13/08/18 11:37, Adam Pocock wrote:
>>>> Long256Vector::mul produces different output once C2 has compiled it.
>>>> I've been running down some non-determinism in my vector version of
>>>> SplittableRandom, and the behaviour changes when C2 compiles it. I
>>>> noticed this as the stream of random numbers changes when C2 compiles
>>>> Long256Vector::mul(long). Only the latter half of the vector is
>>>> affected, the first two lanes still produce the same output. I'm
>>>> pretty sure it's a C2 issue as running with -Xint or
>>>> -XX:TieredStopAtLevel=3 makes the output repeatable across runs.
>>>>
>>>> I've attached two runs with -XX:+PrintCompilation turned on. At line
>>>> 5794 (itr 4920) in the first file (equivalent point is line 5774 in
>>>> the second file) Long256Vector::mul is compiled, and after that point
>>>> the second half of the vector output changes between runs. At line
>>>> 7427 (itr 6561) in the second file Long256Vector::mul is compiled
>>>> (equivalent point is line 7463 in the first file), and after that all
>>>> the draws are the same. This indicates that the seeds for the RNG
>>>> area always updated correctly as otherwise the runs would diverge
>>>> after compilation.
>>>>
>>>> This suggests to me that the latter two lanes of C2 compiled
>>>> Long256Vector::mul(long) aren't correct, as comparing an interpreted
>>>> or C1 run against a run with C2 the latter two lanes diverge
>>>> permanently after C2 compilation.
>>>>
>>>> The JVM reports:
>>>> openjdk version "12-internal" 2019-03-19
>>>> OpenJDK Runtime Environment (build
>>>> 12-internal+0-adhoc.apocock.panama)
>>>> OpenJDK 64-Bit Server VM (build
>>>> 12-internal+0-adhoc.apocock.panama, mixed mode)
>>>>
>>>> and hg log shows the top commit for vectorIntrinsics as:
>>>> changeset: 51784:4408db20792c
>>>> branch: vectorIntrinsics
>>>> tag: tip
>>>> parent: 51782:1700bec0e3b5
>>>> user: rkandu
>>>> date: Wed Aug 01 06:43:33 2018 -0700
>>>> summary: SVML Linux .s files and Windows build fix
>>>>
>>>> The only lines which use mul on a Long256Vector are in the vectorised
>>>> mix64:
>>>> private static final long FIRST_CONSTANT = 0xbf58476d1ce4e5b9L;
>>>> private static final long SECOND_CONSTANT = 0x94d049bb133111ebL;
>>>> private LongVector<S> mix64(LongVector<S> input) {
>>>> input = input.xor(input.shiftR(30)).mul(FIRST_CONSTANT);
>>>> input = input.xor(input.shiftR(27)).mul(SECOND_CONSTANT);
>>>> return input.xor(input.shiftR(31));
>>>> }
>>>>
>>>> My CPU is a Core-i7 6700k, so I expect everything to be using AVX2,
>>>> though I haven't verified it's emitting those instructions.
>>>>
>>>> PS: @Razvan, you were right, my BinarySearch was bugged, I was using
>>>> lessThan rather than lessThanEq to update the loop condition mask.
>>>> Once I fixed that it produces identical output to the sequential
>>>> version I was using.
>>>>
>>>> Thanks,
>>>>
>>>> Adam
>>>>
>>>
More information about the panama-dev
mailing list