[vectorIntrinsics] C2 is fragile

Eugene Kluchnikov eustas.ru at gmail.com
Mon Mar 29 19:34:56 UTC 2021


Hello, Vladimir.

 I've tried replacing ops with "lanewise" in my project... Unfortunately,
that does not help. Going to try to create larger-scope repro for that soon.

Best regards,
  Eugene.

On Fri, 26 Mar 2021 at 22:41, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
wrote:

>
>
> On 17.03.2021 01:31, Paul Sandoz wrote:
> > Hi,
> >
> > The issue might be profile pollution. I am sure Vladimir knows more.
>
> Yes, it is:
>
> @ 45   jdk.incubator.vector.IntVector::add (9 bytes)   force inline by
> annotation
>    @ 5   jdk.incubator.vector.Int128Vector::lanewise (7 bytes)   force
> inline by annotation
>    @ 5   jdk.incubator.vector.Int256Vector::lanewise (7 bytes)   force
> inline by annotation
>     \-> TypeProfile (575/5121 counts) = jdk/incubator/vector/Int256Vector
>     \-> TypeProfile (4546/5121 counts) = jdk/incubator/vector/Int128Vector
>      @ 3   jdk.incubator.vector.Int256Vector::lanewise (10 bytes)
> force inline by annotation
>
>
> And it stems from the absence of profile data in Demo2*::innocent method
> for "acc" value:
>
> 45 invokevirtual 19
> <jdk/incubator/vector/IntVector.add(Ljdk/incubator/vector/Vector;)Ljdk/incubator/vector/IntVector;>
>
>
>    96  bci: 45   VirtualCallData     count(40994) nonprofiled_count(0)
> entries(0)
>
> I believe it is caused by the fact that IntVector::add() is a final
> method, so receiver profiling is not needed to devirtualize the call:
>
>      public final IntVector add(Vector<Integer> v) {
>          return lanewise(ADD, v);
>      }
>
>
> Replacing IntVector::add() with IntVector::lanewise(ADD) helps
> workaround the problem:
>
>    for (int i = 1; i < count; i++) {
>      acc = acc.lanewise(VectorOperators.ADD, IntVector.fromArray(VI4,
> sum, regionX[i] * 4));
>    }
>
>
>    @ 48   jdk.incubator.vector.Int128Vector::lanewise (7 bytes)   force
> inline by annotation
>     \-> TypeProfile (40943/40943 counts) =
> jdk/incubator/vector/Int128Vector
>      @ 3   jdk.incubator.vector.Int128Vector::lanewise (10 bytes)
> force inline by annotation
>
>
> 48 invokevirtual 25
> <jdk/incubator/vector/IntVector.lanewise(Ljdk/incubator/vector/VectorOperators$Binary;Ljdk/incubator/vector/Vector;)Ljdk/incubator/vector/IntVector;>
>
>
>    96  bci: 48   VirtualCallData     count(0) nonprofiled_count(0)
> entries(1)
>
> 'jdk/incubator/vector/Int128Vector'(40994 1.00)
>
> Best regards,
> Vladimir Ivanov
>
> >
> > For the bad case two species for IntVector are used,
> IntVector.SPECIES_256 and IntVector.SPECIES_128.
> >
> > Make:
> >
> >    VI4 = VIP
> >    ...
> >    int[] total = new int[8];
> >
> > And I observe the performance issue goes away. (I did not pay close
> attention to the code to ascertain what algorithm effects that may have.)
> >
> > In general it’s best to stick to the preferred species of each type.
> And, ordinarily, the preferred vector shape for floats and ints should be
> the same (I cannot recall if on some older platforms it might be different).
> >
> > Paul.
> >
> >> On Mar 16, 2021, at 1:36 PM, Eugene Kluchnikov <eustas.ru at gmail.com>
> wrote:
> >>
> >> Hello, Vladimir.
> >>
> >> Sorry for the long delay. I've reduced reproduction cases to small one
> >> file programs.
> >> So, just
> >>
> >>    - javac Demo2a.java && java -XX:+UnlockDiagnosticVMOptions
> >>    -XX:CompileCommand="print,*.innocent" Demo2a
> >>    - javac Demo2b.java && java -XX:+UnlockDiagnosticVMOptions
> >>    -XX:CompileCommand="print,*.innocent" Demo2b
> >>
> >> Demo2a generates "bad" assembly for "innocent" method. Demo2b generates
> >> fairly good assembly for "innocent" method.
> >> In the meanwhile "innocent" method has the same source code. Demo2b runs
> >> 16.4s on my MacBook, 2.7s (turbo-boost disabled for CPU-time stability).
> >>
> >> File Demo2a.java begin >>>
> >>
> >> import jdk.incubator.vector.FloatVector;
> >> import jdk.incubator.vector.IntVector;
> >> import jdk.incubator.vector.VectorSpecies;
> >>
> >> public class Demo2a {
> >> private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_256;
> >> private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_256;
> >> private static final VectorSpecies<Integer> VI4 = IntVector.SPECIES_128;
> >>
> >> static final int STEP = VFP.length();
> >>
> >> static void innocent(int[] sum, int count, int[] regionX, int[] dst) {
> >> if (count > regionX.length)
> >> return;
> >> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
> >> for (int i = 1; i < count; i++) {
> >> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
> >> }
> >> acc.intoArray(dst, 0);
> >> }
> >>
> >> private static final int MAX_INT = (1 << 23) - 1;
> >> private static final IntVector INTEGER_MASK = IntVector.broadcast(VIP,
> >> MAX_INT);
> >> private static final FloatVector IMPLICIT_ONE =
> FloatVector.broadcast(VFP,
> >> MAX_INT + 1);
> >>
> >> static void bad(float[] regionY, float[] regionX0f, float[] regionX1f,
> int[]
> >> rowOffset, int[] regionX, int count) {
> >> FloatVector mNyNx = FloatVector.broadcast(VFP, 42);
> >> FloatVector dNx = FloatVector.broadcast(VFP, 43);
> >> for (int i = 0; i < count; i += STEP) {
> >> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
> >> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
> >> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
> >>
> >> IntVector off = IntVector.fromArray(VIP, rowOffset, i);
> >> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
> >> IntVector xi =
> x.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK);
> >> IntVector xOff = xi.add(off);
> >> xOff.intoArray(regionX, i);
> >> }
> >> }
> >>
> >> public static void main(String[] args) {
> >> int w = 300;
> >> int h = 200;
> >> int stride = 4 * (w + 1);
> >> int[] sum = new int[h * stride];
> >> float[] rY = new float[h];
> >> float[] rX0f = new float[h];
> >> float[] rX1f = new float[h];
> >> int[] rowOffset = new int[h];
> >> int[] rX = new int[h];
> >> int[] total = new int[4];
> >> for (int i = 0; i < 16 * 1024 * 1024; ++i) {
> >> bad(rY, rX0f, rX1f, rowOffset, rX, h);
> >> innocent(sum, h, rX, total);
> >> }
> >> }
> >> }
> >>
> >> <<< File Demo2a.java end
> >>
> >> File Demo2b.java begin >>>
> >>
> >> import jdk.incubator.vector.FloatVector;
> >> import jdk.incubator.vector.IntVector;
> >> import jdk.incubator.vector.VectorSpecies;
> >>
> >> public class Demo2b {
> >> private static final VectorSpecies<Float> VFP = FloatVector.SPECIES_256;
> >> private static final VectorSpecies<Integer> VIP = IntVector.SPECIES_256;
> >> private static final VectorSpecies<Integer> VI4 = IntVector.SPECIES_128;
> >>
> >> static final int STEP = VFP.length();
> >>
> >> static void innocent(int[] sum, int count, int[] regionX, int[] dst) {
> >> if (count > regionX.length)
> >> return;
> >> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
> >> for (int i = 1; i < count; i++) {
> >> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
> >> }
> >> acc.intoArray(dst, 0);
> >> }
> >>
> >> private static final int MAX_INT = (1 << 23) - 1;
> >> private static final IntVector INTEGER_MASK = IntVector.broadcast(VIP,
> >> MAX_INT);
> >> private static final FloatVector IMPLICIT_ONE =
> FloatVector.broadcast(VFP,
> >> MAX_INT + 1);
> >>
> >> static void good(float[] regionY, float[] regionX0f, float[] regionX1f,
> int
> >> [] regionX, int count, int kappa) {
> >> FloatVector mNyNx = FloatVector.broadcast(VFP, 42);
> >> FloatVector dNx = FloatVector.broadcast(VFP, 43);
> >> FloatVector k = FloatVector.broadcast(VFP, kappa);
> >> for (int i = 0; i < count; i += STEP) {
> >> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
> >> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
> >> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
> >>
> >> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
> >> FloatVector xOff = y.fma(k, x);
> >>
> xOff.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK).intoArray(
> >> regionX, i);
> >> }
> >> }
> >>
> >> public static void main(String[] args) {
> >> int w = 300;
> >> int h = 200;
> >> int stride = 4 * (w + 1);
> >> int[] sum = new int[h * stride];
> >> float[] rY = new float[h];
> >> float[] rX0f = new float[h];
> >> float[] rX1f = new float[h];
> >> int[] rX = new int[h];
> >> int[] total = new int[4];
> >> for (int i = 0; i < 16 * 1024 * 1024; ++i) {
> >> good(rY, rX0f, rX1f, rX, h, stride / 4);
> >> innocent(sum, h, rX, total);
> >> }
> >> }
> >> }
> >>
> >> <<< File Demo2b.java end
> >>
> >>
> >> On Mon, 15 Mar 2021 at 10:33, Vladimir Ivanov <
> vladimir.x.ivanov at oracle.com>
> >> wrote:
> >>
> >>> Hi Eugene,
> >>>
> >>> Do you have a test case available so I can try to reproduce the problem
> >>> myself?
> >>>
> >>> The only idea I have right now is that profile pollution is in play
> here:
> >>>
> >>>      private static final VectorSpecies<Integer> VIP =
> >>> IntVector.SPECIES_256;
> >>>      private static final VectorSpecies<Integer> VI4 =
> >>> IntVector.SPECIES_128;
> >>>
> >>>      static void sumAbs(int[] sum, int count, int[] regionX, int[]
> dst) {
> >>>         ...
> >>>         IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
> >>>
> >>> vs
> >>>
> >>>      // BAD
> >>>      IntVector off = IntVector.fromArray(VIP, rowOffset, i);
> >>>
> >>>
> >>> But considering VIP and VI4 are constants (static final), it's hard to
> >>> see how it can be the case.
> >>>
> >>> Best regards,
> >>> Vladimir Ivanov
> >>>
> >>> On 15.03.2021 01:58, Eugene Kluchnikov wrote:
> >>>> private static final VectorSpecies<Float> VFP =
> FloatVector.SPECIES_256;
> >>>> private static final VectorSpecies<Integer> VIP =
> IntVector.SPECIES_256;
> >>>> private static final VectorSpecies<Integer> VI4 =
> IntVector.SPECIES_128;
> >>>>
> >>>> static final int STEP = VFP.length();
> >>>>
> >>>> static void sumAbs(int[] sum, int count, int[] regionX, int[] dst) {
> >>>> if (count > regionX.length) return;
> >>>> IntVector acc = IntVector.fromArray(VI4, sum, regionX[0] * 4);
> >>>> for (int i = 1; i < count; i++) {
> >>>> acc = acc.add(IntVector.fromArray(VI4, sum, regionX[i] * 4));
> >>>> }
> >>>> acc.intoArray(dst, 0);
> >>>> }
> >>>>
> >>>> private static int MAX_INT = (1 << 23) - 1;
> >>>> private static IntVector INTEGER_MASK = IntVector.broadcast(VIP,
> >>> MAX_INT);
> >>>> private static FloatVector IMPLICIT_ONE = FloatVector.broadcast(VFP,
> >>>> MAX_INT + 1);
> >>>>
> >>>> // x >= (d - y * ny) / nx
> >>>> static void updateGeGeneric(int angle, int d, float[] regionY, float[]
> >>>> regionX0f,
> >>>> float[] regionX1f, int[] rowOffset, int[] regionX, int count, int
> kappa)
> >>> {
> >>>> FloatVector mNyNx = FloatVector.broadcast(VFP,
> SinCos.MINUS_COT[angle]);
> >>>> FloatVector dNx = FloatVector.broadcast(VFP, (float)(d *
> SinCos.INV_SIN[
> >>>> angle]));
> >>>> FloatVector k = FloatVector.broadcast(VFP, kappa);
> >>>> for (int i = 0; i < count; i += STEP) {
> >>>> FloatVector y = FloatVector.fromArray(VFP, regionY, i);
> >>>> FloatVector x0 = FloatVector.fromArray(VFP, regionX0f, i);
> >>>> FloatVector x1 = FloatVector.fromArray(VFP, regionX1f, i);
> >>>>
> >>>> // BAD
> >>>> IntVector off = IntVector.fromArray(VIP, rowOffset, i);
> >>>> FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
> >>>> IntVector xi =
> >>> x.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK);
> >>>> IntVector xOff = xi.add(off);
> >>>> xOff.intoArray(regionX, i);
> >>>>
> >>>> // GOOD
> >>>> //FloatVector x = y.fma(mNyNx, dNx).max(x0).min(x1);
> >>>> //FloatVector xOff = y.fma(k, x);
> >>>>
> >>>
> //xOff.add(IMPLICIT_ONE).viewAsIntegralLanes().and(INTEGER_MASK).intoArray(regionX,
> >>>> i);
> >>>> }
> >>>> }
> >>>
> >>
> >>
> >> --
> >> С наилучшими пожеланиями, Евгений Ключников
> >> WBR, Eugene Kluchnikov
> >
>


More information about the panama-dev mailing list