[OpenJDK Rasterizer] AWT & gcc 4.8 optimization options
Jim Graham
james.graham at oracle.com
Fri Jan 15 22:34:19 UTC 2016
The lookups were written in 1997-ish when processors had different
vectorization/computation tradeoffs. It might be interesting to
investigate a non-table version of the macros and see how the
performance differs...
...jim
On 1/15/16 1:49 PM, Sergey Bylokhov wrote:
> Hi,
>
> I found that in case of vectorisation on of the main hotspot is out
> table lookup pattern: mul8table/div8table which cannot be vectorized.
> Another hotspot is a many conditions inside the main loops.
>
> On 15/01/16 20:14, Laurent Bourgès wrote:
>> Sergey,
>>
>> Did you made any progress ?
>>
>> I finally looked at the preprocessed C code and also enabled
>> ftree-vectorizer-verbose output:
>> CFLAGS := -save-temps -ftree-vectorize -ftree-vectorizer-verbose=2
>> $(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \
>>
>>
>> I looked at the IntArgbPreSrcMaskFill hotspot (in my EllipseFillTest)
>> according to oprofile:
>> samples % image name symbol name
>> 469141 30.0043 libawt.so IntArgbPreSrcMaskFill
>>
>>
>> Here is the preprocessed C code:
>> - It is still complex to read as there are many do { } while (0) blocks
>> due to macro expansion...
>>
>> void IntArgbSrcMaskFill (void *rasBase, jubyte *pMask, jint maskOff,
>> jint maskScan, jint width, jint height, jint fgColor, SurfaceDataRasInfo
>> *pRasInfo, NativePrimitive *pPrim, CompositeInfo *pCompInfo)
>> {
>> jint srcA;
>> jint srcR, srcG, srcB;
>> jint rasScan = pRasInfo->scanStride;
>> IntArgbDataType *pRas = (IntArgbDataType *) (rasBase);
>> jint DstPix;
>> do
>> {
>> (srcB) = (fgColor) & 0xff;
>> (srcG) = ((fgColor) >> 8) & 0xff;
>> (srcR) = ((fgColor) >> 16) & 0xff;
>> (srcA) = ((fgColor) >> 24) & 0xff;
>> }
>> while (0);
>> if (srcA == 0)
>> {
>> srcR = srcG = srcB = 0;
>> fgColor = 0;
>> }
>> else
>> {
>> if (!(0))
>> {
>> fgColor = (srcA << 24) | (fgColor & 0x00ffffff);
>> ;
>> }
>> if (srcA != 0xff)
>> {
>> do
>> {
>> srcR = mul8table[srcA][srcR];
>> srcG = mul8table[srcA][srcG];
>> srcB = mul8table[srcA][srcB];
>> }
>> while (0);
>> }
>> if (0)
>> {
>> ;
>> }
>> }
>> DstPix = 0;
>> ;
>> rasScan -= width * 4;
>> if (pMask)
>> {
>> pMask += maskOff;
>> maskScan -= width;
>> do
>> {
>> jint w = width;
>> ;
>> do
>> {
>> jint resA;
>> jint resR, resG, resB;
>> jint dstF;
>> jint pathA = *pMask++;
>> if (pathA > 0)
>> {
>> if (pathA == 0xff)
>> {
>> (pRas)[0] = (fgColor);
>> }
>> else
>> {
>> ;
>> dstF = 0xff - pathA;
>> do
>> {
>> DstPix = (pRas)[0];
>> resA = ((juint) DstPix) >> 24;
>> }
>> while (0);
>> resA = mul8table[dstF][resA];
>> if (!(0))
>> {
>> dstF = resA;
>> }
>> resA += mul8table[pathA][srcA];
>> do
>> {
>> resR = (DstPix >> 16) & 0xff;
>> resG = (DstPix >> 8) & 0xff;
>> resB = (DstPix >> 0) & 0xff;
>> }
>> while (0);
>> do
>> {
>> resR = mul8table[dstF][resR] +
>> mul8table[pathA][srcR];
>> resG = mul8table[dstF][resG] +
>> mul8table[pathA][srcG];
>> resB = mul8table[dstF][resB] +
>> mul8table[pathA][srcB];
>> }
>> while (0);
>> if (!(0) && resA && resA < 0xff)
>> {
>> do
>> {
>> resR = div8table[resA][resR];
>> resG = div8table[resA][resG];
>> resB = div8table[resA][resB];
>> }
>> while (0);
>> }
>> (pRas)[0] = (((((((resA) << 8) | (resR)) << 8)
>> | (resG)) << 8) | (resB));
>> }
>> }
>> pRas = ((void *) (((intptr_t) (pRas)) + (4)));
>> ;
>> }
>> while (--w > 0);
>> pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
>> ;
>> pMask = ((void *) (((intptr_t) (pMask)) + (maskScan)));
>> }
>> while (--height > 0);
>> }
>> else
>> {
>> do
>> {
>> jint w = width;
>> ;
>> do
>> {
>> (pRas)[0] = (fgColor);
>> pRas = ((void *) (((intptr_t) (pRas)) + (4)));
>> ;
>> }
>> while (--w > 0);
>> pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
>> ;
>> }
>> while (--height > 0);
>> }
>> }
>>
>> It seems that alpha blending macros are quite complex and can not be
>> vectorized:
>>
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: not vectorized: control flow in loop.
>> IntArgb.c:109: note: bad inner-loop form.
>> IntArgb.c:109: note: not vectorized: Bad inner loop.
>> IntArgb.c:109: note: bad loop form.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: not vectorized: control flow in loop.
>> IntArgb.c:109: note: bad loop form.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: failed: evolution of base is not affine.
>> IntArgb.c:109: note: bad data references.
>> Analyzing loop at IntArgb.c:109
>> IntArgb.c:109: note: Unknown misalignment, is_packed = 0
>> IntArgb.c:109: note: virtual phi. skip.
>> IntArgb.c:109: note: not vectorized: value used after loop.
>> IntArgb.c:109: note: bad operation or unsupported loop bound.
>> IntArgb.c:109: note: vectorized 0 loops in function.
>> IntArgb.c:109: note: not consecutive access rasScan_26 =
>> pRasInfo_25(D)->scanStride;
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: Unknown alignment for access: mul8table
>> IntArgb.c:109: note: not consecutive access _40 =
>> mul8table[srcA_36][srcB_33];
>> IntArgb.c:109: note: not consecutive access _42 =
>> mul8table[srcA_36][srcB_31];
>> IntArgb.c:109: note: not consecutive access _44 =
>> mul8table[srcA_36][srcB_29];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *pMask_1
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Unknown alignment for access: mul8table
>> IntArgb.c:109: note: not consecutive access _65 =
>> mul8table[dstF_60][resA_64];
>> IntArgb.c:109: note: not consecutive access _67 =
>> mul8table[pathA_58][srcA_36];
>> IntArgb.c:109: note: not consecutive access _75 =
>> mul8table[dstF_66][resR_71];
>> IntArgb.c:109: note: not consecutive access _77 =
>> mul8table[pathA_58][srcB_6];
>> IntArgb.c:109: note: not consecutive access _80 =
>> mul8table[dstF_66][resG_73];
>> IntArgb.c:109: note: not consecutive access _82 =
>> mul8table[pathA_58][srcB_7];
>> IntArgb.c:109: note: not consecutive access _85 =
>> mul8table[dstF_66][resB_74];
>> IntArgb.c:109: note: not consecutive access _87 =
>> mul8table[pathA_58][srcB_8];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: Unknown alignment for access: div8table
>> IntArgb.c:109: note: not consecutive access _93 =
>> div8table[resA_69][resR_79];
>> IntArgb.c:109: note: not consecutive access _95 =
>> div8table[resA_69][resG_84];
>> IntArgb.c:109: note: not consecutive access _97 =
>> div8table[resA_69][resB_89];
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>> IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
>> IntArgb.c:109: note: Unknown alignment for access: *rasBase_11
>> IntArgb.c:109: note: Failed to SLP the basic block.
>> IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
>> basic block.
>> IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
>>
>>
>> Any idea to make such code faster ? or to make it work with
>> vectorization ?
>>
>>
>> Finally I noticed that the macros with Lcd suffix seems to perform
>> proper gamma corrections:
>>
>> void IntArgbDrawGlyphListLCD(SurfaceDataRasInfo *pRasInfo, ImageRef
>> *glyphs, jint totalGlyphs, jint fgpixel, jint argbcolor, jint clipLeft,
>> jint clipTop, jint clipRight, jint clipBottom, jint rgbOrder, unsigned
>> char *gammaLut, unsigned char * invGammaLut, NativePrimitive *pPrim,
>> CompositeInfo *pCompInfo)
>> ...
>> srcR = invGammaLut[srcR];
>> srcG = invGammaLut[srcG];
>> srcB = invGammaLut[srcB];
>> ...
>> alpha blending
>> ...
>> dstR = gammaLut[dstR];
>> dstG = gammaLut[dstG];
>> dstB = gammaLut[dstB];
>>
>> That's exactly what I want to implement the correct gamma correction in
>> mask fill operations (shape draw / fill) for software loops (buffered
>> image rendering).
>>
>> I will try now to figure out how that C code is generated by the nested
>> macros !
>>
>> Laurent
>
>
More information about the graphics-rasterizer-dev
mailing list