[OpenJDK Rasterizer] AWT & gcc 4.8 optimization options
Laurent Bourgès
bourges.laurent at gmail.com
Fri Jan 15 17:14:46 UTC 2016
Did you made any progress ?
I finally looked at the preprocessed C code and also enabled
ftree-vectorizer-verbose output:
CFLAGS := -save-temps -ftree-vectorize -ftree-vectorizer-verbose=2
I looked at the IntArgbPreSrcMaskFill hotspot (in my EllipseFillTest)
according to oprofile:
samples % image name symbol name
469141 30.0043 libawt.so IntArgbPreSrcMaskFill
Here is the preprocessed C code:
- It is still complex to read as there are many do { } while (0) blocks due
to macro expansion...
void IntArgbSrcMaskFill (void *rasBase, jubyte *pMask, jint maskOff, jint
maskScan, jint width, jint height, jint fgColor, SurfaceDataRasInfo
*pRasInfo, NativePrimitive *pPrim, CompositeInfo *pCompInfo)
jint srcA;
jint srcR, srcG, srcB;
jint rasScan = pRasInfo->scanStride;
IntArgbDataType *pRas = (IntArgbDataType *) (rasBase);
jint DstPix;
(srcB) = (fgColor) & 0xff;
(srcG) = ((fgColor) >> 8) & 0xff;
(srcR) = ((fgColor) >> 16) & 0xff;
(srcA) = ((fgColor) >> 24) & 0xff;
while (0);
if (srcA == 0)
srcR = srcG = srcB = 0;
fgColor = 0;
if (!(0))
fgColor = (srcA << 24) | (fgColor & 0x00ffffff);
if (srcA != 0xff)
srcR = mul8table[srcA][srcR];
srcG = mul8table[srcA][srcG];
srcB = mul8table[srcA][srcB];
while (0);
if (0)
DstPix = 0;
rasScan -= width * 4;
if (pMask)
pMask += maskOff;
maskScan -= width;
jint w = width;
jint resA;
jint resR, resG, resB;
jint dstF;
jint pathA = *pMask++;
if (pathA > 0)
if (pathA == 0xff)
(pRas)[0] = (fgColor);
dstF = 0xff - pathA;
DstPix = (pRas)[0];
resA = ((juint) DstPix) >> 24;
while (0);
resA = mul8table[dstF][resA];
if (!(0))
dstF = resA;
resA += mul8table[pathA][srcA];
resR = (DstPix >> 16) & 0xff;
resG = (DstPix >> 8) & 0xff;
resB = (DstPix >> 0) & 0xff;
while (0);
resR = mul8table[dstF][resR] +
resG = mul8table[dstF][resG] +
resB = mul8table[dstF][resB] +
while (0);
if (!(0) && resA && resA < 0xff)
resR = div8table[resA][resR];
resG = div8table[resA][resG];
resB = div8table[resA][resB];
while (0);
(pRas)[0] = (((((((resA) << 8) | (resR)) << 8) |
(resG)) << 8) | (resB));
pRas = ((void *) (((intptr_t) (pRas)) + (4)));
while (--w > 0);
pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
pMask = ((void *) (((intptr_t) (pMask)) + (maskScan)));
while (--height > 0);
jint w = width;
(pRas)[0] = (fgColor);
pRas = ((void *) (((intptr_t) (pRas)) + (4)));
while (--w > 0);
pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
while (--height > 0);
It seems that alpha blending macros are quite complex and can not be
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad inner-loop form.
IntArgb.c:109: note: not vectorized: Bad inner loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: failed: evolution of base is not affine.
IntArgb.c:109: note: bad data references.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: Unknown misalignment, is_packed = 0
IntArgb.c:109: note: virtual phi. skip.
IntArgb.c:109: note: not vectorized: value used after loop.
IntArgb.c:109: note: bad operation or unsupported loop bound.
IntArgb.c:109: note: vectorized 0 loops in function.
IntArgb.c:109: note: not consecutive access rasScan_26 =
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _40 =
IntArgb.c:109: note: not consecutive access _42 =
IntArgb.c:109: note: not consecutive access _44 =
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *pMask_1
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _65 =
IntArgb.c:109: note: not consecutive access _67 =
IntArgb.c:109: note: not consecutive access _75 =
IntArgb.c:109: note: not consecutive access _77 =
IntArgb.c:109: note: not consecutive access _80 =
IntArgb.c:109: note: not consecutive access _82 =
IntArgb.c:109: note: not consecutive access _85 =
IntArgb.c:109: note: not consecutive access _87 =
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: Unknown alignment for access: div8table
IntArgb.c:109: note: not consecutive access _93 =
IntArgb.c:109: note: not consecutive access _95 =
IntArgb.c:109: note: not consecutive access _97 =
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_11
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
Any idea to make such code faster ? or to make it work with vectorization ?
Finally I noticed that the macros with Lcd suffix seems to perform proper
gamma corrections:
void IntArgbDrawGlyphListLCD(SurfaceDataRasInfo *pRasInfo, ImageRef
*glyphs, jint totalGlyphs, jint fgpixel, jint argbcolor, jint clipLeft,
jint clipTop, jint clipRight, jint clipBottom, jint rgbOrder, unsigned char
*gammaLut, unsigned char * invGammaLut, NativePrimitive *pPrim,
CompositeInfo *pCompInfo)
srcR = invGammaLut[srcR];
srcG = invGammaLut[srcG];
srcB = invGammaLut[srcB];
alpha blending
dstR = gammaLut[dstR];
dstG = gammaLut[dstG];
dstB = gammaLut[dstB];
That's exactly what I want to implement the correct gamma correction in
mask fill operations (shape draw / fill) for software loops (buffered image
I will try now to figure out how that C code is generated by the nested
macros !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20160115/c54c8ebe/attachment-0001.html>
More information about the graphics-rasterizer-dev
mailing list