[vectorIntrinsics] C2 is fragile

Paul Sandoz paul.sandoz at oracle.com
Mon Mar 29 17:25:51 UTC 2021


Hi Vladimir,

Any recommendations on what we should do here? Remove the final modifier on such methods? Although that seems like an unsatisfying workaround.

Paul.

> On Mar 26, 2021, at 2:40 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> 
> 
> On 17.03.2021 01:31, Paul Sandoz wrote:
>> Hi,
>> The issue might be profile pollution. I am sure Vladimir knows more.
> 
> Yes, it is:
> 
> @ 45   jdk.incubator.vector.IntVector::add (9 bytes)   force inline by annotation
>  @ 5   jdk.incubator.vector.Int128Vector::lanewise (7 bytes)   force inline by annotation
>  @ 5   jdk.incubator.vector.Int256Vector::lanewise (7 bytes)   force inline by annotation
>   \-> TypeProfile (575/5121 counts) = jdk/incubator/vector/Int256Vector
>   \-> TypeProfile (4546/5121 counts) = jdk/incubator/vector/Int128Vector
>    @ 3   jdk.incubator.vector.Int256Vector::lanewise (10 bytes) force inline by annotation
> 
> 
> And it stems from the absence of profile data in Demo2*::innocent method for "acc" value:
> 
> 45 invokevirtual 19 <jdk/incubator/vector/IntVector.add(Ljdk/incubator/vector/Vector;)Ljdk/incubator/vector/IntVector;> 
>  96  bci: 45   VirtualCallData     count(40994) nonprofiled_count(0) entries(0)
> 
> I believe it is caused by the fact that IntVector::add() is a final method, so receiver profiling is not needed to devirtualize the call:
> 
>    public final IntVector add(Vector<Integer> v) {
>        return lanewise(ADD, v);
>    }
> 
> 
> Replacing IntVector::add() with IntVector::lanewise(ADD) helps workaround the problem:
> 
>  for (int i = 1; i < count; i++) {
>    acc = acc.lanewise(VectorOperators.ADD, IntVector.fromArray(VI4, sum, regionX[i] * 4));
>  }
> 
> 
>  @ 48   jdk.incubator.vector.Int128Vector::lanewise (7 bytes)   force inline by annotation
>   \-> TypeProfile (40943/40943 counts) = jdk/incubator/vector/Int128Vector
>    @ 3   jdk.incubator.vector.Int128Vector::lanewise (10 bytes) force inline by annotation
> 
> 
> 48 invokevirtual 25 <jdk/incubator/vector/IntVector.lanewise(Ljdk/incubator/vector/VectorOperators$Binary;Ljdk/incubator/vector/Vector;)Ljdk/incubator/vector/IntVector;> 
>  96  bci: 48   VirtualCallData     count(0) nonprofiled_count(0) entries(1)
> 'jdk/incubator/vector/Int128Vector'(40994 1.00)
> 
> Best regards,
> Vladimir Ivanov
> 
>> For the bad case two species for IntVector are used, IntVector.SPECIES_256 and IntVector.SPECIES_128.
>> Make:
>>   VI4 = VIP
>>   ...
>>   int[] total = new int[8];
>> And I observe the performance issue goes away. (I did not pay close attention to the code to ascertain what algorithm effects that may have.)
>> In general it’s best to stick to the preferred species of each type. And, ordinarily, the preferred vector shape for floats and ints should be the same (I cannot recall if on some older platforms it might be different).
>> Paul.
>>> On Mar 16, 2021, at 1:36 PM, Eugene Kluchnikov <eustas.ru at gmail.com> wrote:
>>> 
>>> Hello, Vladimir.
>>> 
>>> Sorry for the long delay. I've reduced reproduction cases to small one
>>> file programs.
>>> So, just
>>> 
>>>   - javac Demo2a.java && java -XX:+UnlockDiagnosticVMOptions
>>>   -XX:CompileCommand="print,*.innocent" Demo2a
>>>   - javac Demo2b.java && java -XX:+UnlockDiagnosticVMOptions
>>>   -XX:CompileCommand="print,*.innocent" Demo2b
>>> 
>>> Demo2a generates "bad" assembly for "innocent" method. Demo2b generates
>>> fairly good assembly for "innocent" method.
>>> In the meanwhile "innocent" method has the same source code. Demo2b runs
>>> 16.4s on my MacBook, 2.7s (turbo-boost disabled for CPU-time stability).
>>> 
>>> File Demo2a.java begin >>>
>>> 
>>> import jdk.incubator.vector.FloatVector;
>>> import jdk.incubator.vector.IntVector;
>>> import jdk.incubator.vector.VectorSpecies;
>>> 
>>> public class Demo2a {
>>> private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_256;
>>> private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_256;
>>> private static final VectorSpecies<Integer> VI4 = IntVector.SPECIES_128;
>>> 
>>> static final int STEP = VFP.length();
>>> 
>>> static void innocent(int[] sum, int count, int[] regionX, int[] dst) {
>>> if (count > regionX.length)
>>> return;
>>> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
>>> for (int i = 1; i < count; i++) {
>>> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
>>> }
>>> acc.intoArray(dst, 0);
>>> }
>>> 
>>> private static final int MAX_INT = (1 << 23) - 1;
>>> private static final IntVector INTEGER_MASK = IntVector.broadcast(VIP,
>>> MAX_INT);
>>> private static final FloatVector IMPLICIT_ONE = FloatVector.broadcast(VFP,
>>> MAX_INT + 1);
>>> 
>>> static void bad(float[] regionY, float[] regionX0f, float[] regionX1f, int[]
>>> rowOffset, int[] regionX, int count) {
>>> FloatVector mNyNx = FloatVector.broadcast(VFP, 42);
>>> FloatVector dNx = FloatVector.broadcast(VFP, 43);
>>> for (int i = 0; i < count; i += STEP) {
>>> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
>>> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
>>> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
>>> 
>>> IntVector off = IntVector.fromArray(VIP, rowOffset, i);
>>> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
>>> IntVector xi = x.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK);
>>> IntVector xOff = xi.add(off);
>>> xOff.intoArray(regionX, i);
>>> }
>>> }
>>> 
>>> public static void main(String[] args) {
>>> int w = 300;
>>> int h = 200;
>>> int stride = 4 * (w + 1);
>>> int[] sum = new int[h * stride];
>>> float[] rY = new float[h];
>>> float[] rX0f = new float[h];
>>> float[] rX1f = new float[h];
>>> int[] rowOffset = new int[h];
>>> int[] rX = new int[h];
>>> int[] total = new int[4];
>>> for (int i = 0; i < 16 * 1024 * 1024; ++i) {
>>> bad(rY, rX0f, rX1f, rowOffset, rX, h);
>>> innocent(sum, h, rX, total);
>>> }
>>> }
>>> }
>>> 
>>> <<< File Demo2a.java end
>>> 
>>> File Demo2b.java begin >>>
>>> 
>>> import jdk.incubator.vector.FloatVector;
>>> import jdk.incubator.vector.IntVector;
>>> import jdk.incubator.vector.VectorSpecies;
>>> 
>>> public class Demo2b {
>>> private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_256;
>>> private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_256;
>>> private static final VectorSpecies<Integer> VI4 = IntVector.SPECIES_128;
>>> 
>>> static final int STEP = VFP.length();
>>> 
>>> static void innocent(int[] sum, int count, int[] regionX, int[] dst) {
>>> if (count > regionX.length)
>>> return;
>>> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
>>> for (int i = 1; i < count; i++) {
>>> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
>>> }
>>> acc.intoArray(dst, 0);
>>> }
>>> 
>>> private static final int MAX_INT = (1 << 23) - 1;
>>> private static final IntVector INTEGER_MASK = IntVector.broadcast(VIP,
>>> MAX_INT);
>>> private static final FloatVector IMPLICIT_ONE = FloatVector.broadcast(VFP,
>>> MAX_INT + 1);
>>> 
>>> static void good(float[] regionY, float[] regionX0f, float[] regionX1f, int
>>> [] regionX, int count, int kappa) {
>>> FloatVector mNyNx = FloatVector.broadcast(VFP, 42);
>>> FloatVector dNx = FloatVector.broadcast(VFP, 43);
>>> FloatVector k = FloatVector.broadcast(VFP, kappa);
>>> for (int i = 0; i < count; i += STEP) {
>>> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
>>> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
>>> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
>>> 
>>> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
>>> FloatVector xOff = y.fma(k, x);
>>> xOff.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK).intoArray(
>>> regionX, i);
>>> }
>>> }
>>> 
>>> public static void main(String[] args) {
>>> int w = 300;
>>> int h = 200;
>>> int stride = 4 * (w + 1);
>>> int[] sum = new int[h * stride];
>>> float[] rY = new float[h];
>>> float[] rX0f = new float[h];
>>> float[] rX1f = new float[h];
>>> int[] rX = new int[h];
>>> int[] total = new int[4];
>>> for (int i = 0; i < 16 * 1024 * 1024; ++i) {
>>> good(rY, rX0f, rX1f, rX, h, stride / 4);
>>> innocent(sum, h, rX, total);
>>> }
>>> }
>>> }
>>> 
>>> <<< File Demo2b.java end
>>> 
>>> 
>>> On Mon, 15 Mar 2021 at 10:33, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> wrote:
>>> 
>>>> Hi Eugene,
>>>> 
>>>> Do you have a test case available so I can try to reproduce the problem
>>>> myself?
>>>> 
>>>> The only idea I have right now is that profile pollution is in play here:
>>>> 
>>>>     private static final VectorSpecies<Integer> VIP =
>>>> IntVector.SPECIES_256;
>>>>     private static final VectorSpecies<Integer> VI4 =
>>>> IntVector.SPECIES_128;
>>>> 
>>>>     static void sumAbs(int[] sum, int count, int[] regionX, int[] dst) {
>>>>        ...
>>>>        IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
>>>> 
>>>> vs
>>>> 
>>>>     // BAD
>>>>     IntVector off = IntVector.fromArray(VIP, rowOffset, i);
>>>> 
>>>> 
>>>> But considering VIP and VI4 are constants (static final), it's hard to
>>>> see how it can be the case.
>>>> 
>>>> Best regards,
>>>> Vladimir Ivanov
>>>> 
>>>> On 15.03.2021 01:58, Eugene Kluchnikov wrote:
>>>>> private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_256;
>>>>> private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_256;
>>>>> private static final VectorSpecies<Integer> VI4 = IntVector.SPECIES_128;
>>>>> 
>>>>> static final int STEP = VFP.length();
>>>>> 
>>>>> static void sumAbs(int[] sum, int count, int[] regionX, int[] dst) {
>>>>> if (count > regionX.length) return;
>>>>> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
>>>>> for (int i = 1; i < count; i++) {
>>>>> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
>>>>> }
>>>>> acc.intoArray(dst, 0);
>>>>> }
>>>>> 
>>>>> private static int MAX_INT = (1 << 23) - 1;
>>>>> private static IntVector INTEGER_MASK = IntVector.broadcast(VIP,
>>>> MAX_INT);
>>>>> private static FloatVector IMPLICIT_ONE = FloatVector.broadcast(VFP,
>>>>> MAX_INT + 1);
>>>>> 
>>>>> // x >= (d - y * ny) / nx
>>>>> static void updateGeGeneric(int angle, int d, float[] regionY, float[]
>>>>> regionX0f,
>>>>> float[] regionX1f, int[] rowOffset, int[] regionX, int count, int kappa)
>>>> {
>>>>> FloatVector mNyNx = FloatVector.broadcast(VFP, SinCos.MINUS_COT[angle]);
>>>>> FloatVector dNx = FloatVector.broadcast(VFP, (float)(d * SinCos.INV_SIN[
>>>>> angle]));
>>>>> FloatVector k = FloatVector.broadcast(VFP, kappa);
>>>>> for (int i = 0; i < count; i += STEP) {
>>>>> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
>>>>> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
>>>>> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
>>>>> 
>>>>> // BAD
>>>>> IntVector off = IntVector.fromArray(VIP, rowOffset, i);
>>>>> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
>>>>> IntVector xi =
>>>> x.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK);
>>>>> IntVector xOff = xi.add(off);
>>>>> xOff.intoArray(regionX, i);
>>>>> 
>>>>> // GOOD
>>>>> //FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
>>>>> //FloatVector xOff = y.fma(k, x);
>>>>> 
>>>> //xOff.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK).intoArray(regionX,
>>>>> i);
>>>>> }
>>>>> }
>>>> 
>>> 
>>> 
>>> -- 
>>> С наилучшими пожеланиями, Евгений Ключников
>>> WBR, Eugene Kluchnikov



More information about the panama-dev mailing list