[OpenJDK Rasterizer] AWT & gcc 4.8 optimization options
Laurent Bourgès
bourges.laurent at gmail.com
Fri Jan 15 17:14:46 UTC 2016
Sergey,
Did you made any progress ?
I finally looked at the preprocessed C code and also enabled
ftree-vectorizer-verbose output:
CFLAGS := -save-temps -ftree-vectorize -ftree-vectorizer-verbose=2
$(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \
I looked at the IntArgbPreSrcMaskFill hotspot (in my EllipseFillTest)
according to oprofile:
samples % image name symbol name
469141 30.0043 libawt.so IntArgbPreSrcMaskFill
Here is the preprocessed C code:
- It is still complex to read as there are many do { } while (0) blocks due
to macro expansion...
void IntArgbSrcMaskFill (void *rasBase, jubyte *pMask, jint maskOff, jint
maskScan, jint width, jint height, jint fgColor, SurfaceDataRasInfo
*pRasInfo, NativePrimitive *pPrim, CompositeInfo *pCompInfo)
{
jint srcA;
jint srcR, srcG, srcB;
jint rasScan = pRasInfo->scanStride;
IntArgbDataType *pRas = (IntArgbDataType *) (rasBase);
jint DstPix;
do
{
(srcB) = (fgColor) & 0xff;
(srcG) = ((fgColor) >> 8) & 0xff;
(srcR) = ((fgColor) >> 16) & 0xff;
(srcA) = ((fgColor) >> 24) & 0xff;
}
while (0);
if (srcA == 0)
{
srcR = srcG = srcB = 0;
fgColor = 0;
}
else
{
if (!(0))
{
fgColor = (srcA << 24) | (fgColor & 0x00ffffff);
;
}
if (srcA != 0xff)
{
do
{
srcR = mul8table[srcA][srcR];
srcG = mul8table[srcA][srcG];
srcB = mul8table[srcA][srcB];
}
while (0);
}
if (0)
{
;
}
}
DstPix = 0;
;
rasScan -= width * 4;
if (pMask)
{
pMask += maskOff;
maskScan -= width;
do
{
jint w = width;
;
do
{
jint resA;
jint resR, resG, resB;
jint dstF;
jint pathA = *pMask++;
if (pathA > 0)
{
if (pathA == 0xff)
{
(pRas)[0] = (fgColor);
}
else
{
;
dstF = 0xff - pathA;
do
{
DstPix = (pRas)[0];
resA = ((juint) DstPix) >> 24;
}
while (0);
resA = mul8table[dstF][resA];
if (!(0))
{
dstF = resA;
}
resA += mul8table[pathA][srcA];
do
{
resR = (DstPix >> 16) & 0xff;
resG = (DstPix >> 8) & 0xff;
resB = (DstPix >> 0) & 0xff;
}
while (0);
do
{
resR = mul8table[dstF][resR] +
mul8table[pathA][srcR];
resG = mul8table[dstF][resG] +
mul8table[pathA][srcG];
resB = mul8table[dstF][resB] +
mul8table[pathA][srcB];
}
while (0);
if (!(0) && resA && resA < 0xff)
{
do
{
resR = div8table[resA][resR];
resG = div8table[resA][resG];
resB = div8table[resA][resB];
}
while (0);
}
(pRas)[0] = (((((((resA) << 8) | (resR)) << 8) |
(resG)) << 8) | (resB));
}
}
pRas = ((void *) (((intptr_t) (pRas)) + (4)));
;
}
while (--w > 0);
pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
;
pMask = ((void *) (((intptr_t) (pMask)) + (maskScan)));
}
while (--height > 0);
}
else
{
do
{
jint w = width;
;
do
{
(pRas)[0] = (fgColor);
pRas = ((void *) (((intptr_t) (pRas)) + (4)));
;
}
while (--w > 0);
pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
;
}
while (--height > 0);
}
}
It seems that alpha blending macros are quite complex and can not be
vectorized:
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad inner-loop form.
IntArgb.c:109: note: not vectorized: Bad inner loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: failed: evolution of base is not affine.
IntArgb.c:109: note: bad data references.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: Unknown misalignment, is_packed = 0
IntArgb.c:109: note: virtual phi. skip.
IntArgb.c:109: note: not vectorized: value used after loop.
IntArgb.c:109: note: bad operation or unsupported loop bound.
IntArgb.c:109: note: vectorized 0 loops in function.
IntArgb.c:109: note: not consecutive access rasScan_26 =
pRasInfo_25(D)->scanStride;
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _40 =
mul8table[srcA_36][srcB_33];
IntArgb.c:109: note: not consecutive access _42 =
mul8table[srcA_36][srcB_31];
IntArgb.c:109: note: not consecutive access _44 =
mul8table[srcA_36][srcB_29];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *pMask_1
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _65 =
mul8table[dstF_60][resA_64];
IntArgb.c:109: note: not consecutive access _67 =
mul8table[pathA_58][srcA_36];
IntArgb.c:109: note: not consecutive access _75 =
mul8table[dstF_66][resR_71];
IntArgb.c:109: note: not consecutive access _77 =
mul8table[pathA_58][srcB_6];
IntArgb.c:109: note: not consecutive access _80 =
mul8table[dstF_66][resG_73];
IntArgb.c:109: note: not consecutive access _82 =
mul8table[pathA_58][srcB_7];
IntArgb.c:109: note: not consecutive access _85 =
mul8table[dstF_66][resB_74];
IntArgb.c:109: note: not consecutive access _87 =
mul8table[pathA_58][srcB_8];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: Unknown alignment for access: div8table
IntArgb.c:109: note: not consecutive access _93 =
div8table[resA_69][resR_79];
IntArgb.c:109: note: not consecutive access _95 =
div8table[resA_69][resG_84];
IntArgb.c:109: note: not consecutive access _97 =
div8table[resA_69][resB_89];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_11
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
Any idea to make such code faster ? or to make it work with vectorization ?
Finally I noticed that the macros with Lcd suffix seems to perform proper
gamma corrections:
void IntArgbDrawGlyphListLCD(SurfaceDataRasInfo *pRasInfo, ImageRef
*glyphs, jint totalGlyphs, jint fgpixel, jint argbcolor, jint clipLeft,
jint clipTop, jint clipRight, jint clipBottom, jint rgbOrder, unsigned char
*gammaLut, unsigned char * invGammaLut, NativePrimitive *pPrim,
CompositeInfo *pCompInfo)
...
srcR = invGammaLut[srcR];
srcG = invGammaLut[srcG];
srcB = invGammaLut[srcB];
...
alpha blending
...
dstR = gammaLut[dstR];
dstG = gammaLut[dstG];
dstB = gammaLut[dstB];
That's exactly what I want to implement the correct gamma correction in
mask fill operations (shape draw / fill) for software loops (buffered image
rendering).
I will try now to figure out how that C code is generated by the nested
macros !
Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20160115/c54c8ebe/attachment-0001.html>
More information about the graphics-rasterizer-dev
mailing list