RFR: 8279508: Auto-vectorize Math.round API [v3]
Jatin Bhateja
jbhateja at openjdk.java.net
Mon Feb 14 17:18:07 UTC 2022
On Mon, 14 Feb 2022 09:12:54 GMT, Andrew Haley <aph at openjdk.org> wrote:
>>> What does this do? Comment, even pseudo code, would be nice.
>>
>> Thanks @theRealAph , I shall append the comments over the routine.
>> BTW, entire rounding algorithm can also be implemented using Vector API which can perform if-conversion using masked operations.
>>
>> class roundf {
>> public static VectorSpecies ISPECIES = IntVector.SPECIES_512;
>> public static VectorSpecies SPECIES = FloatVector.SPECIES_512;
>>
>> public static int round_vector(float[] a, int[] r, int ctr) {
>> IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127);
>> for (int i = 0; i < a.length; i += SPECIES.length()) {
>> FloatVector fv = FloatVector.fromArray(SPECIES, a, i);
>> IntVector iv = fv.reinterpretAsInts();
>> IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000);
>> biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23);
>> IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, biasedExpV);
>> VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32)
>> .compare(VectorOperators.EQ, 0);
>> IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF)
>> .lanewise(VectorOperators.OR, 0x007FFFFF + 1);
>> VectorMask cond1 = iv.compare(VectorOperators.LT, 0);
>> VectorMask cond2 = cond1.and(cond);
>> res = res.lanewise(VectorOperators.NEG, cond2);
>> res = res.lanewise(VectorOperators.ASHR, shiftV)
>> .lanewise(VectorOperators.ADD, 1)
>> .lanewise(VectorOperators.ASHR, 1);
>> res = fv.convert(VectorOperators.F2I, 0)
>> .reinterpretAsInts()
>> .blend(res, cond);
>> res.intoArray(r, i);
>> }
>> return r[ctr];
>> }
>
> That pseudocode would make a very useful comment too. This whole patch is very thinly commented.
> > Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks.
>
> Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects.
**Just want to correct above statement, LDMXCSR will not be re-ordered/re-scheduled early OOO backend.**
-------------
PR: https://git.openjdk.java.net/jdk/pull/7094
More information about the core-libs-dev
mailing list