[OpenJDK Rasterizer] AWT & gcc 4.8 optimization options

Jim Graham james.graham at oracle.com
Fri Jan 15 22:34:19 UTC 2016


The lookups were written in 1997-ish when processors had different 
vectorization/computation tradeoffs.  It might be interesting to 
investigate a non-table version of the macros and see how the 
performance differs...

			...jim

On 1/15/16 1:49 PM, Sergey Bylokhov wrote:
> Hi,
>
> I found that in case of vectorisation on of the main hotspot is out
> table lookup pattern: mul8table/div8table which cannot be vectorized.
> Another hotspot is a many conditions inside the main loops.
>
> On 15/01/16 20:14, Laurent Bourgès wrote:
>> Sergey,
>>
>> Did you made any progress ?
>>
>> I finally looked at the preprocessed C code and also enabled
>> ftree-vectorizer-verbose output:
>>      CFLAGS := -save-temps -ftree-vectorize -ftree-vectorizer-verbose=2
>> $(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \
>>
>>
>> I looked at the IntArgbPreSrcMaskFill hotspot (in my EllipseFillTest)
>> according to oprofile:
>> samples  %        image name               symbol name
>> 469141   30.0043  libawt.so                IntArgbPreSrcMaskFill
>>
>>
>> Here is the preprocessed C code:
>> - It is still complex to read as there are many do { } while (0) blocks
>> due to macro expansion...
>>
>> void IntArgbSrcMaskFill (void *rasBase, jubyte *pMask, jint maskOff,
>> jint maskScan, jint width, jint height, jint fgColor, SurfaceDataRasInfo
>> *pRasInfo, NativePrimitive *pPrim, CompositeInfo *pCompInfo)
>> {
>>      jint srcA;
>>      jint srcR, srcG, srcB;
>>      jint rasScan = pRasInfo->scanStride;
>>      IntArgbDataType *pRas = (IntArgbDataType *) (rasBase);
>>      jint DstPix;
>>      do
>>      {
>>          (srcB) = (fgColor) & 0xff;
>>          (srcG) = ((fgColor) >> 8) & 0xff;
>>          (srcR) = ((fgColor) >> 16) & 0xff;
>>          (srcA) = ((fgColor) >> 24) & 0xff;
>>      }
>>      while (0);
>>      if (srcA == 0)
>>      {
>>          srcR = srcG = srcB = 0;
>>          fgColor = 0;
>>      }
>>      else
>>      {
>>          if (!(0))
>>          {
>>              fgColor = (srcA << 24) | (fgColor & 0x00ffffff);
>>              ;
>>          }
>>          if (srcA != 0xff)
>>          {
>>              do
>>              {
>>                  srcR = mul8table[srcA][srcR];
>>                  srcG = mul8table[srcA][srcG];
>>                  srcB = mul8table[srcA][srcB];
>>              }
>>              while (0);
>>          }
>>          if (0)
>>          {
>>              ;
>>          }
>>      }
>>      DstPix = 0;
>>      ;
>>      rasScan -= width * 4;
>>      if (pMask)
>>      {
>>          pMask += maskOff;
>>          maskScan -= width;
>>          do
>>          {
>>              jint w = width;
>>              ;
>>              do
>>              {
>>                  jint resA;
>>                  jint resR, resG, resB;
>>                  jint dstF;
>>                  jint pathA = *pMask++;
>>                  if (pathA > 0)
>>                  {
>>                      if (pathA == 0xff)
>>                      {
>>                          (pRas)[0] = (fgColor);
>>                      }
>>                      else
>>                      {
>>                          ;
>>                          dstF = 0xff - pathA;
>>                          do
>>                          {
>>                              DstPix = (pRas)[0];
>>                              resA = ((juint) DstPix) >> 24;
>>                          }
>>                          while (0);
>>                          resA = mul8table[dstF][resA];
>>                          if (!(0))
>>                          {
>>                              dstF = resA;
>>                          }
>>                          resA += mul8table[pathA][srcA];
>>                          do
>>                          {
>>                              resR = (DstPix >> 16) & 0xff;
>>                              resG = (DstPix >> 8) & 0xff;
>>                              resB = (DstPix >> 0) & 0xff;
>>                          }
>>                          while (0);
>>                          do
>>                          {
>>                              resR = mul8table[dstF][resR] +
>> mul8table[pathA][srcR];
>>                              resG = mul8table[dstF][resG] +
>> mul8table[pathA][srcG];
>>                              resB = mul8table[dstF][resB] +
>> mul8table[pathA][srcB];
>>                          }
>>                          while (0);
>>                          if (!(0) && resA && resA < 0xff)
>>                          {
>>                              do
>>                              {
>>                                  resR = div8table[resA][resR];
>>                                  resG = div8table[resA][resG];
>>                                  resB = div8table[resA][resB];
>>                              }
>>                              while (0);
>>                          }
>>                          (pRas)[0] = (((((((resA) << 8) | (resR)) << 8)
>> | (resG)) << 8) | (resB));
>>                      }
>>                  }
>>                  pRas = ((void *) (((intptr_t) (pRas)) + (4)));
>>                  ;
>>              }
>>              while (--w > 0);
>>              pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
>>              ;
>>              pMask = ((void *) (((intptr_t) (pMask)) + (maskScan)));
>>          }
>>          while (--height > 0);
>>      }
>>      else
>>      {
>>          do
>>          {
>>              jint w = width;
>>              ;
>>              do
>>              {
>>                  (pRas)[0] = (fgColor);
>>                  pRas = ((void *) (((intptr_t) (pRas)) + (4)));
>>                  ;
>>              }
>>              while (--w > 0);
>>              pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
>>              ;
>>          }
>>          while (--height > 0);
>>      }
>> }
>>
>> It seems that alpha blending macros are quite complex and can not be
>> vectorized:
>>
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: not vectorized: control flow in loop.
>> IntArgb.c:109: note: bad inner-loop form.
>> IntArgb.c:109: note: not vectorized: Bad inner loop.
>> IntArgb.c:109: note: bad loop form.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: not vectorized: control flow in loop.
>> IntArgb.c:109: note: bad loop form.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: failed: evolution of base is not affine.
>> IntArgb.c:109: note: bad data references.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: Unknown misalignment, is_packed = 0
>> IntArgb.c:109: note: virtual phi. skip.
>> IntArgb.c:109: note: not vectorized: value used after loop.
>> IntArgb.c:109: note: bad operation or unsupported loop bound.
>> IntArgb.c:109: note: vectorized 0 loops in function.
>> IntArgb.c:109: note: not consecutive access rasScan_26 =
>> pRasInfo_25(D)->scanStride;
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: Unknown alignment for access: mul8table
>> IntArgb.c:109: note: not consecutive access _40 =
>> mul8table[srcA_36][srcB_33];
>> IntArgb.c:109: note: not consecutive access _42 =
>> mul8table[srcA_36][srcB_31];
>> IntArgb.c:109: note: not consecutive access _44 =
>> mul8table[srcA_36][srcB_29];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *pMask_1
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Unknown alignment for access: mul8table
>> IntArgb.c:109: note: not consecutive access _65 =
>> mul8table[dstF_60][resA_64];
>> IntArgb.c:109: note: not consecutive access _67 =
>> mul8table[pathA_58][srcA_36];
>> IntArgb.c:109: note: not consecutive access _75 =
>> mul8table[dstF_66][resR_71];
>> IntArgb.c:109: note: not consecutive access _77 =
>> mul8table[pathA_58][srcB_6];
>> IntArgb.c:109: note: not consecutive access _80 =
>> mul8table[dstF_66][resG_73];
>> IntArgb.c:109: note: not consecutive access _82 =
>> mul8table[pathA_58][srcB_7];
>> IntArgb.c:109: note: not consecutive access _85 =
>> mul8table[dstF_66][resB_74];
>> IntArgb.c:109: note: not consecutive access _87 =
>> mul8table[pathA_58][srcB_8];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: Unknown alignment for access: div8table
>> IntArgb.c:109: note: not consecutive access _93 =
>> div8table[resA_69][resR_79];
>> IntArgb.c:109: note: not consecutive access _95 =
>> div8table[resA_69][resG_84];
>> IntArgb.c:109: note: not consecutive access _97 =
>> div8table[resA_69][resB_89];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_11
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>>
>>
>> Any idea to make such code faster ? or to make it work with
>> vectorization ?
>>
>>
>> Finally I noticed that the macros with Lcd suffix seems to perform
>> proper gamma corrections:
>>
>> void IntArgbDrawGlyphListLCD(SurfaceDataRasInfo *pRasInfo, ImageRef
>> *glyphs, jint totalGlyphs, jint fgpixel, jint argbcolor, jint clipLeft,
>> jint clipTop, jint clipRight, jint clipBottom, jint rgbOrder, unsigned
>> char *gammaLut, unsigned char * invGammaLut, NativePrimitive *pPrim,
>> CompositeInfo *pCompInfo)
>> ...
>>      srcR = invGammaLut[srcR];
>>      srcG = invGammaLut[srcG];
>>      srcB = invGammaLut[srcB];
>> ...
>> alpha blending
>> ...
>>      dstR = gammaLut[dstR];
>>      dstG = gammaLut[dstG];
>>      dstB = gammaLut[dstB];
>>
>> That's exactly what I want to implement the correct gamma correction in
>> mask fill operations (shape draw / fill) for software loops (buffered
>> image rendering).
>>
>> I will try now to figure out how that C code is generated by the nested
>> macros !
>>
>> Laurent
>
>


More information about the graphics-rasterizer-dev mailing list