[OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts

Mon May 2 15:29:14 UTC 2016

Dear Jim & Everyone on Java2D Community

First, thanks to Jim for every word in the feedback. 

Java is used by millions of devices and users. We cannot compromise on quality.
It 's through these reviews and iterations that we ensure a consistent solution gets checked in.

I 've incorporated the suggestions from last review and changes are available in webrev.
Review Link: http://cr.openjdk.java.net/~jdv/Prahalad/8015070/webrev.06/

Quick Brief on Changes :
    1.  The macros- Declare ## DST ## SrcOverDstBlendFactor and Store ## DST ## SrcOverDstBlendFactor are removed. 
    2.  Their usage is now replaced with 
                 DeclareAlphaVarFor4ByteArgb : For declaring the blend factor
                 SrcOver ## DST ## BlendFactor(dstF, dstA) : To return the appropriate blend factor between dstF and dstA
    3.  Most of the macro names contain image TYPE in the earlier part and ForSTRATEGY at the end of the name. 
                 (Note: This is just an observation not a Thumb rule) Hence SrcOver ## DST ## BlendFactor macro name is being used.
    4.  As with every change in the native code, Internal automated build system was used to ensure successful compilation across all platforms.

Kindly review the changes and provide your opinion.

Thank you
Have a good day

Prahalad N.

-----Original Message-----
From: Jim Graham 
Sent: Thursday, April 28, 2016 2:05 AM
To: Prahalad Kumar Narayanan; 2d-dev at openjdk.java.net
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts

Thanks Prahalad,

First, the macro design issues for all of these LoopMacros.h et al files 
are pretty complex, so I apologize if this is a big learning step here 
and if the feedback can look like nit-picking at times.  But the system 
is complicated enough that we should take care to make sure the new work 
remains consistent to what is already there and to keep this complicated 
system maintainable in the long run.

The only issue I have with the new macros is that Store*() naming 
convention is typically used when the macro itself does the assignment. 
In the case of the new Store*BlendFactor() macros, they simply return 
the value and the caller does the assignment, so the name is off.  This 
could be fixed either by moving "blendF" to an argument to the macro, or 
renaming it to one of the suggestions I gave earlier that don't imply 
assignment inside the macro.

Another thing to consider is that the type of the blend factor is 
actually determined by the alpha blend token "STRATEGY", not by the 
image type.  Many of the alpha blend types are dependent on the image 
types being used, but it is not solely determined by the image type, 
sometimes it depends on the pair of src/dst image types.  In any case, 
the STRATEGY used for alpha blending is not directly tied to an image 
type.  Other alpha loop macros take a STRATEGY as an argument and invoke 
the proper alpha blending value declaration and manipulation macros with 
"For ## Strategy" macro name expansions.  But we are hard-coding 
"For4ByteArgb" in this particular macro, which means we are hard-coding 
the particular type of alpha math.  Since that determination is done by 
hard-coding in this macro and not by delegating to the image type, then 
it is inappropriate to ask the image-type-based macro to decide how to 
declare the blend factor.  If anything, we should use 
"DeclareAlphaVarFor4ByteArgb()" to declare the variable, not an 
image-based macro.  Note that the Declare() macros for the alpha math 
strategies take an argument called "VAR" so then you can assume that 
they've named the variable the same token that you supplied.

My recommendation would be to delete the new Declare macros and replace 
their use in LoopMacros.h with a usage of the existing 
DeclareAlphaVarFor4ByteArgb() macro, and then I would move the resulting 
variable into the argument list for the Store ## BlendFactor macro and 
move the assignment part of the statement inside of the macro...

			...jim

On 4/27/16 6:37 AM, Prahalad Kumar Narayanan wrote:
> Dear Jim and Everyone on Java2D Community
>
> Good day to you.
>
> First, Thanks to Jim for his feedback.
> I 've incorporated his feedback in latest version of code and webrev link is shared below
>         Webrev Link: http://cr.openjdk.java.net/~aghaisas/prahalad/8015070/webrev.05/
>
> Quick Description on Changes :
> 1. Removing redundant code MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX)
>         The redundant code within if and else blocks of if ( mixValSrc != 0xff ) has been moved into block that does color blending.
>         As Jim rightly said, the change would execute Multiply operation only if blending is required
>         On the other hand, If resA reaches maximum value, the fast path is executed.
>
> 2. Fast path execution
>         The fast path that directly sets foreground color has been moved into else block after checking if ( resA != MaxValFor4ByteArgb )
>         Conceptually, resA would be maximum only when glyphAlpha (mixValSrc) and srcAlpha are maximum.
>
> 3. Isolated Declare and Store macros.
>          A single macro DeclareAndInit... has been split into two macros Declare ## DST ## SrcOverDstBlendFactor and Store ## DST ## SrcOverDstBlendFactor.
>          This would indeed address some compilers that do not allow declaration in the middle of the block.
>
> 4. Other Relevant Information:
>         The changes have been verified to build on all platforms through JPRT auto build system.
>         Java2D benchmarks were run with the changes and no degradation on performance was seen compared to webrev.04
>         Regression was run and no new regression failures have been introduced with the change.
>
> 5. To reduce reviewer 's effort in review, code has been expanded with comments herewith.
>
> #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
>                                   FG_PIXEL, PREFIX, SRC_PREFIX) \
>     do { \
>         DeclareAlphaVarFor4ByteArgb(resA) \
>         DeclareCompVarsFor4ByteArgb(res) \
>         jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
>         if (mixValSrc) { \
>             /* Evaluating srcAlpha component */ \
>             if (mixValSrc != 0xff) { \
>                 /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
>                 PromoteByteAlphaFor4ByteArgb(mixValSrc); \
>                 /* Glyph mask determines visibility of the srcColor */ \
>                 resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
>             } else { \
>                 /* This is understood easier with floating point logic. */ \
>                 /* 1.0f is max value used in floating point calculation */ \
>                 /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
>                 /* srcA without Multiply with glyph mask. */ \
>                 resA = SRC_PREFIX ## A; \
>             } \
>            /* Blending srcColor and dstColor */ \
>             /* This is required only when resA(i.e., srcA) is not maximum */ \
>             if (resA != MaxValFor4ByteArgb) { \
>                 DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
>                 DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
>                 DeclareCompVarsFor4ByteArgb(dst) \
>                 DeclareCompVarsFor4ByteArgb(tmp) \
>                 /* Redundant statement moved from previous if -else block */ \
>                 MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
>                 /* Based on the pixelFormat we could reduce calculations */ \
>                 /* done to load the color and alpha components */ \
>                 if (!(DST ## IsPremultiplied)) { \
>                     Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
>                                                dstA, dstR, dstG, dstB); \
>                     Store4ByteArgbCompsUsingOp(tmp, =, dst); \
>                 } else { \
>                     Declare ## DST ## AlphaLoadData(DstPix) \
>                     jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
>                     DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
>                                                                 pixelOffset); \
>                     /* The macro's implementation loads color components  */ \
>                     /* without divide by alpha adjustment as required for */ \
>                     /* subsequent calculations. Note: This is used only   */ \
>                     /* with preMultiplied alpha based pixel formats */ \
>                     LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
>                                                          DstPix, \
>                                                          dst); \
>                     Postload4ByteArgbFrom ## DST(pixelAddress, \
>                                                  DstPix, \
>                                                  tmp); \
>                 } \
>                 /* Avoid blending operatons if dst is fully transparent */ \
>                 if (dstA) { \
>                     Declare ## DST ## SrcOverDstBlendFactor(blendF) \
>                     /* dstA would be 0 if either of the following is true.  */ \
>                     /* 1. srcA is max. Parent if condition validates this.  */ \
>                     /* 2. dstA is zero. The current if condition validates  */ \
>                     /* Henceforth, the following Multiply need not be moved */ \
>                     /* ahead of the if condition. This also helps to better */ \
>                     /* performance */ \
>                     dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
>                     resA += dstA; \
>                     /* Declares blendF variable and assigns appropriate */ \
>                     /* alpha value. The definitions are contained in the*/ \
>                     /* header files of respective pixel formats */ \
>                     blendF = Store ## DST ## SrcOverDstBlendFactor(dstF, \
>                                                                    dstA); \
>                     /* This is understood easier with floating point logic.*/ \
>                     /* 1.0f is max value used in floating point calculation*/ \
>                     /* (1.0f * tmp) gives tmp. Hence we avoid Multiply     */ \
>                     /* operation and directly add loaded color to result*/ \
>                     if (blendF != MaxValFor4ByteArgb) { \
>                         MultiplyAndStore4ByteArgbComps(tmp, \
>                                                        blendF, \
>                                                        tmp); \
>                     } \
>                     Store4ByteArgbCompsUsingOp(res, +=, tmp); \
>                 } \
>             } else { \
>                 /* resA is maximum only when srcA and glyphAlpha are max   */ \
>                 /* Hence the fast path to store foreground pixel color and */ \
>                 /* break to the next pixel. */ \
>                 Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
>                                           FG_PIXEL, PREFIX); \
>                 break; \
>             } \
>             /* In the above calculations, color values are multiplied with */ \
>             /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
>             /* based pixel formats. For non pre-multiplied alpha based     */ \
>             /* pixel formats, the alpha is removed from color components   */ \
>             /* and then stored to the resulting color */ \
>             if (!(DST ## IsOpaque) && \
>                 !(DST ## IsPremultiplied) && resA && \
>                 resA < MaxValFor4ByteArgb) \
>             { \
>                 DivideAndStore4ByteArgbComps(res, res, resA); \
>             } \
>             Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
>                                                PIXEL_INDEX, res); \
>         } \
>     } while (0);
>
> Kindly review the changes present in the webrev and provide your views.
>
> Thank you once again for your effort in review
> Have a great day
>
> Prahalad N.
>
>
> -----Original Message-----
> From: Jim Graham
> Sent: Wednesday, April 27, 2016 2:12 AM
> To: Prahalad Kumar Narayanan; 2d-dev at openjdk.java.net
> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts
>
> Hi Prahalad,
>
> One potential portability issue - you have a "DeclareAndInit" macro for the new "choose which blend factor" macro in the middle of a block.
> Some C compilers allow declaring a new variable in the middle of a block, but not all.  Since it would be hard to move the declaration to the top of the block, since it depends on a value computed in the first couple of lines, it might be better to change it from a "DeclareAndInit"
> style macro into a regular declaration, and a macro to get the value, so "jint blendF;" at the top of the block and "blendF = SrcOverDstBlendFactorFor ## DST(...);" in the middle of the block.  (Or name it "SrcOver ## DST ## BlendFactor(...)?"
>
> I just noticed something about this .04 version of the function.  At the top you have a test for mixValSrc != 255 and then both clauses of the if statement end with essentially the same code, multiplying the source by the value in resA.  (The version in the else clause uses SRC_PREFIX ## A, but it could just as easily also use resA since they are the same value at that point.)
>
> This means you could potentially move both "MultiplyAndStore" macros until after the if and use resA as the multiplier.  Now, if you look, the immediate next statement tests if resA is maxVal.  If it is maxVal, then you don't need to do that multiply at all, so the macro could be moved even further to be inside the "if (resA != maxVal)" block.
>
> At that point, you could then reinstate the "else" on that if block and move the Store##DST##PixelData into that else, to save you another test of resA.  This would essentially look like this:
>
> if (mixValSrc != 255) {
>      PromoteByteAlpha...;
>      resA = MultiplyAlpha...;
> } else {
>      resA = SRC_PREFIX ## A;
> }
> if (resA != MaxVal) {
>      Declare,declare,declare,declare...;
>      Multiply...Comps(res, resA, SRC_PREFIX); } else {
>      Store ## DST ## PixelData...;
> }
>
> It shortens the code a little, but I'm not sure if it would really help performance other than not having to do the test for maxVal twice.
> Still, tests are fairly expensive in code like this so it could be worth a shot at testing, and it would simplify the code a bit in either case...
>
> 			...jim
>
> On 4/15/16 12:25 AM, Prahalad Kumar Narayanan wrote:
>> Hello Jim & Everyone on Java2D Community
>>
>> Good day to you.
>>
>> This is a review request and a follow-up to the bug-fix for the issue
>> Bug     : JDK-8015070
>> Link    : https://bugs.openjdk.java.net/browse/JDK-8015070
>>
>> Webrev Link :
>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.04/
>>
>> Quick Inferences on Changes in Current -Webrev.04
>>
>> 1 ) Subtle changes to the blend loop-
>>         . The subtle changes taken up, have helped to improve the performance.
>>         . With the current logic in webrev.04, Java2DBench reports better performance than Un-Optimized solution that was present in webrev.00
>>         . J2DBench was run for Font Styles { Plain, Bold, Italic, Bold n Italic } and  Font Sizes { 12, 20, 36, 72 }
>>         . My sincere thanks to Jim for all his detailed feedback through the multiple reviews that has evolved the solution today.
>>
>> (Details on changes)
>> 1.a.  Loading of Color components
>>         . When modelled as per SRCOVER_MASK_FILL code, the logic required few additional calculations to load color components.
>>         . The extra calculations indeed impacted performance figures.
>>         . This could be offset in  two possible ways
>>                  a. Inspect parent macro- NAME_SOLID_DRAWGLYPHLISTAA and advance by pixel address and not by pixel index.
>>                              The parent macro invokes GlyphListAABlendFourByteArgb through this macro- GlyphListAABlend ## STRATEGY(DST, pixels, x, pPix, fgpixel, solidpix, src);
>>                              Changing parent macro will cause spurious changes across GlyphListAABlend ## other pixel formats.
>>                              There is additional risk of breaking the stable and well optimized  LoopMacros.h.
>>                 b. Load color components based on pre-Multiplication status
>>                              This has been taken up and change is limited to the function being modified.
>>                              Thankfully J2DBench has still reported improvement in performance.
>>
>> 1.b.  New Macro to avoid if (DST ## IsPremultiplied) {
>>         . A new macro- DeclareAndInit ## DST ## SrcOverMaskBlendFactor has been introduced to choose between dstF, or dstA
>>         . The implementation is available in the header files of pixel formats ( Eg: IntArgb.h IntArgbPre.h )
>>         . There are 29 different pixel formats known to Java2D, and
>>                  . Hence, the new macro's implementation is added only to those pixel formats that require the current glyph blending logic.
>>
>> 2 ) Testing across different formats
>>          . The Regression test code has been modified to test anti-aliased text rendering on 7 different pixel formats-
>>                     . IntArgb, IntArgb_Pre, FourByteAbgr, FourByteAbgr_Pre, IntRGB, IntBGR, 3ByteBGR.
>>          . As expected, the test fails without the fix on JDK 8 and JDK 7 versions & passes with JDK 9-internal containing the fix.
>>
>> 3 ) Explanation on Code Changes :
>>         . With multiple reviews and changes, today the code fixes the bug and is well optimized as well.
>>         . For ease of reviewer and effort in review, I 've explained the logic with /* comment statements */ herewith.
>>         . Note: These comments don't figure in the webrev.
>>                        As one cannot guarantee how /* comments */ within macros would be perceived by compiler across different platforms.
>>
>> #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
>>                                                                             FG_PIXEL, PREFIX, SRC_PREFIX) \
>>     do { \
>>         DeclareAlphaVarFor4ByteArgb(resA) \
>>         DeclareCompVarsFor4ByteArgb(res) \
>>         jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
>>         if (mixValSrc) { \
>>             /* Evaluating srcColor components */ \
>>             if (mixValSrc != 0xff) { \
>>                 /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
>>                 PromoteByteAlphaFor4ByteArgb(mixValSrc); \
>>                 /* Glyph mask determines visibility of the srcColor */ \
>>                 resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
>>                 MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
>>             } else { \
>>                 /* If mixValSrc and srcA are maximum, then result color is */ \
>>                 /* opaque. Hence the fast path to store foreground pixel   */ \
>>                 /* color and return. */ \
>>                 if (SRC_PREFIX ## A == MaxValFor4ByteArgb) { \
>>                     Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
>>                                               FG_PIXEL, PREFIX); \
>>                     break; \
>>                 } \
>>                 /* This is understood easier with floating point logic. */ \
>>                 /* 1.0f is max value used in floating point calculation */ \
>>                 /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
>>                 /* srcA without Multiply with glyph mask. */ \
>>                 resA = SRC_PREFIX ## A; \
>>                 MultiplyAndStore4ByteArgbComps(res, \
>>                                                SRC_PREFIX ## A, \
>>                                                SRC_PREFIX); \
>>             } \
>>             /* Evaluating dstColor components */ \
>>             /* This is required only when resA(i.e., srcA) is not maximum */ \
>>             if (resA != MaxValFor4ByteArgb) { \
>>                 DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
>>                 DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
>>                 DeclareCompVarsFor4ByteArgb(dst) \
>>                 DeclareCompVarsFor4ByteArgb(tmp) \
>>                 /* Based on the pixelFormat we could reduce calculations */ \
>>                 /* done to load the color and alpha components */ \
>>                 if (!(DST ## IsPremultiplied)) { \
>>                     Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
>>                                                dstA, dstR, dstG, dstB); \
>>                     Store4ByteArgbCompsUsingOp(tmp, =, dst); \
>>                 } else { \
>>                     Declare ## DST ## AlphaLoadData(DstPix) \
>>                     jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
>>                     DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
>>                                                                 pixelOffset); \
>>                     /* The macro's implementation loads color components  */ \
>>                     /* without divide by alpha adjustment as required for */ \
>>                     /* subsequent calculations. Note: This is used only   */ \
>>                     /* with preMultiplied alpha based pixel formats */ \
>>                     LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
>>                                                          DstPix, \
>>                                                          dst); \
>>                     Postload4ByteArgbFrom ## DST(pixelAddress, \
>>                                                  DstPix, \
>>                                                  tmp); \
>>                 } \
>>                 /* Avoid blending operatons if dst is fully transparent */ \
>>                 if (dstA) { \
>>                     /* dstA would be 0 if either of the following is true. */ \
>>                     /* 1. srcA is max. Parent if condition validates this. */ \
>>                     /* 2. dstA is zero. The current if condition validates */ \
>>                     /* Henceforth, the following Multiply need not be moved*/ \
>>                     /* ahead of the if condition. This also helps to better*/ \
>>                     /* performance */ \
>>                     dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
>>                     resA += dstA; \
>>                     /* Declares blendF variable and assigns appropriate */ \
>>                     /* alpha value. The definitions are contained in the*/ \
>>                     /* header files of respective pixel formats */ \
>>                     DeclareAndInit ## DST ## SrcOverDstBlendFactor(dstF, \
>>                                                                    dstA, \
>>                                                                    blendF); \
>>                     /* This is understood easier with floating point logic.*/ \
>>                     /* 1.0f is max value used in floating point calculation*/ \
>>                     /* (1.0f * tmp) gives tmp. Hence we avoid 3 Multiply   */ \
>>                     /* operations and add the loaded color to result       */ \
>>                     if (blendF != MaxValFor4ByteArgb) { \
>>                         MultiplyAndStore4ByteArgbComps(tmp, \
>>                                                        blendF, \
>>                                                        tmp); \
>>                     } \
>>                     Store4ByteArgbCompsUsingOp(res, +=, tmp); \
>>                 } \
>>             } \
>>             /* In the above calculations, color values are multiplied with */ \
>>             /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
>>             /* based pixel formats. For non pre-multiplied alpha based     */ \
>>             /* pixel formats, the alpha is removed from color components   */ \
>>             /* and then stored to the resulting color.  */ \
>>             if (!(DST ## IsOpaque) && \
>>                 !(DST ## IsPremultiplied) && resA && \
>>                 resA < MaxValFor4ByteArgb) \
>>             { \
>>                 DivideAndStore4ByteArgbComps(res, res, resA); \
>>             } \
>>             Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
>>                                                PIXEL_INDEX, res); \
>>         } \
>>     } while (0);
>>
>> My apologies if the above code did not appear on the final webrev email.
>> ( In few instances, the newlines don't appear in plain-text format )
>>
>> Kindly review the changes present in webrev and provide your views.
>> If the changes are good, we could take up for the code check-in.
>>
>> Thank you for your time in review
>> Have a good day
>>
>> Prahalad N.
>>
>>
>>
>> -----Original Message-----
>> From: Jim Graham
>> Sent: Tuesday, April 05, 2016 3:07 AM
>> To: Prahalad Kumar Narayanan; Sergey Bylokhov; Philip Race
>> Cc: Praveen Srivastava
>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>> Antialiased text on translucent backgrounds gets bright artifacts
>>
>> Hi Prahalad,
>>
>> Can I see the webrev diffs for the "today's experiment" code with the new macro?
>>
>> Also, Did you test this with opaque destinations?  The most common use
>> of text is on an opaque destination, so that would matter more.  I
>> would
>> suggest: INT_RGB, THREE_BYTE_BGR, INT_ARGB, INT_ARGB_PRE in that order of precedence of importance.  Also, test with translucent colors, though those are less important than opaque colors.
>>
>> I'm still looking at why the non-pre would be slower than the pre.
>> About the only difference is the one line "if (!PRE) { DSTF = DSTA; }".
>>   One suggestion might be to move the test for transparent destination up a couple of lines, and to get rid of the extra "DSTF = dstA"
>> assignement with the following structure:
>>
>> 	dstA = Mult...();
>> 	if (dstA) {
>> 	    resA += dstA;
>> 	    Declare...
>> 	    Postload...
>> 	    if (DST ## IsPremultiplied) {
>> 	        MultiplyAndStore(..., DSTF, ...);
>> 	    } else {
>> 	        MultiplyAndStore(..., dstA, ...);
>> 	    }
>> 	    Store...
>> 	}
>>
>> Basically, dstA == 0 is the actual test for "do we need to even try to blend the destination in here".  If it is zero then there is no need to add dstA to resA and there is no need to adjust the factor we blend by (MultiplyAndStore).  It can be triggered by either a transparent destination pixel or an opaque source pixel, but either way, dstA is the right test, not DSTF.  The second part, eliminating the DSTF=dstA assignment, gets rid of a line that might trip up the optimizer by simply having the macro expand differently for the two types.  To be even more streamlined, we could create a new set of macros:
>>
>> // In the header files for PRE types:
>> #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dF)
>> // In the header files for non-PRE types:
>> #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dA)
>>
>> Then we wouldn't need the test above for "if (DST ## Pre)", we would just expand the macro with:
>> 	MultiplyAndStore(..., SRCOVER_DST_BLEND_FACTOR ## DST(DSTF, dstA), ...) which would hopefully confuse the optimizer even less.
>>
>> If you want to really eliminate the pixel address computations, you could rewrite the calling loop so that it steps along a pixel pointer rather than using indexing.  Have the calling function/macro set up a pRas pointer to the pixel and step that along the row by TYPE##PixelStride as it iterates across the glyph, then hand that into the Glyph blend macro as DST_PTR (and eliminate PIXEL_INDEX as it would be "0" in this case)...
>>
>> 			...jim
>>
>> On 4/4/16 4:37 AM, Prahalad Kumar Narayanan wrote:
>>> Dear Jim
>>>
>>> Good day to you.
>>>
>>> ( Just in-case, you had missed my long Friday 's email )
>>>
>>> Quick Recap of Proceedings
>>>
>>> 1.On Friday, I had profiled two solutions that fix the bug-
>>> JDK-8015070, using Java2D Bench
>>>
>>> 2.Profiling was done for 16 test cases with different combinations of
>>>
>>> a.Font Style: Plain, Bold, Italic, Bold-Italic
>>>
>>> b.Font Size: 12, 20, 36, 72
>>>
>>> 3.Result from Friday 's profiling experiments:
>>>
>>> a.Regular Solution (Un-optimized) : This was observed to be faster
>>> for IntArgb pixel format
>>>
>>> b.Optimized Solution (based on SrcOver_MaskFill with fast path) :
>>> This was observed to be faster for IntArgb_Pre pixel format
>>>
>>> Update from Today's Experiments
>>>
>>> 1.First, I understood that new calculations introduced (pixelAddress
>>> computation) impacted performance of optimized solution in IntArgb format.
>>>
>>> 2.Henceforth, I made the following changes, while loading destination color:
>>>
>>> a.Check if the pixel format is PreMultiplied
>>>
>>> b.If the format is preMultiplied, then > take up new calculations and
>>> use LoadAlphaFrom ## DST ## For4ByteArgb macro that does *Not* cause
>>> divide by alpha adjustment
>>>
>>> c.If the format is regular Argb, then > take up loading of colors
>>> using standard Load ## DST ## To4ByteArgb
>>>
>>> 3.Once the release build was available, Java2D Bench was re-run
>>> (using pre-saved options file)
>>>
>>> Result from Bench metrics:
>>>
>>> a.In most of the test cases, the optimized solution has higher metric :
>>> Avg characters/ second for both IntArgb and IntArgb_Pre formats
>>>
>>> b.In 6 / total-16 test cases, optimized solution was marginally lower
>>> than the metrics of Regular un-optimized algorithm (only for
>>> IntArgb_Pre)
>>>
>>> c.However, J2DAnalyzer reported that even these 6-test cases were
>>> within 10% deviation. Hence the algorithms were categorized to be
>>> 'same' in performance.
>>>
>>> Suggestion Required
>>>
>>> 1.The attached zip file, contains Algorithms.cpp - Which lists down
>>> different algorithms profiled so far.
>>>
>>> 2.I 've introduced comments within the macro to explain the change.
>>>
>>> a.Note: These comments will be removed from the final source code to
>>> be checked in.
>>>
>>> 3.Kindly review the latest algorithm (for any issues/ bugs) and
>>> provide your suggestions.
>>>
>>> a.latest algorithm can be easily traced by searching for
>>> "LoadOptimized Algorithm v3" within the file.
>>>
>>> Thank you for your time in review & detailed feedback that you get
>>> every time.
>>>
>>> Every such review improves the quality of code & the solution
>>>
>>> Prahalad N.
>>>
>>> *From:* Prahalad Kumar Narayanan
>>> *Sent:* Friday, April 01, 2016 5:07 PM
>>> *To:* Jim Graham; Sergey Bylokhov; Philip Race
>>> *Cc:* Praveen Srivastava
>>> *Subject:* RE: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>>> Antialiased text on translucent backgrounds gets bright artifacts
>>>
>>> Dear Jim
>>>
>>> Good day to you.
>>>
>>> Thanks for your suggestions in the reviews that has evolved the fix
>>> to the bug.
>>>
>>> As guided by you, I measured the performance of Text Rendering with
>>> Text Antialiasing Hint on using Java2D Bench Tool.
>>>
>>> Solutions Profiled:
>>>
>>>          We have two solutions - Un optimized solution and Optimized
>>> solution modelled as per SRCOVER_MASKFILL
>>>
>>>          ( Both the solutions were profiled on windows x86_64 with
>>> JDK 9-internal Release build )
>>>
>>> Test Cases Profiled:
>>>
>>>          With Font set to : Lucida sans, different combinations of
>>>
>>>               Font Styles : Plain, Bold, Italic, Bold Italic  &&
>>>
>>>               Font Sizes   : 12, 20, 36, 72 points were tested.
>>>
>>> Attached herewith: JDK8015070-Profiling Data.zip
>>>
>>> The archive contains:
>>>
>>>          1. Algorithms.cpp     : Just to have a quick glance of all the
>>> algorithms that we have tried so far.
>>>
>>>          2. *.txt Files               : For each test, Java2D bench
>>> reports the average metrics/second.
>>>
>>>                                                         The text file
>>> contains collection of all such average metric for nearly 16
>>> different test cases.
>>>
>>>          3. res Output            : .res output from each of the test runs.
>>>
>>> Observation from J2DBench Reports
>>>
>>>          1.  I could not get time to run the Analyzer tool across
>>> each of these *.res files. I shall do them on Monday.
>>>
>>>          2.  From the summary text ( average chars per. Second ) that
>>> J2DBench reported,
>>>
>>>                      Un-optimized solution seems to be better for
>>> IntArgb pixel format and
>>>
>>>                      Optimized solution is better for IntArgb_Pre
>>> pixel format by significant margin.
>>>
>>>          3.  I tried to improve the optimized algorithm (based on
>>> SRCOVER_MASKFILL ) further by adding a if (dstA) { ...
>>>
>>>                      Though there is a marginal improvement, the
>>> optimized solution could not exceed numbers of regular algorithm (for
>>> IntArgb pixel format)
>>>
>>>                      This could be due to the extra calculations that
>>> we do in-order to load color components separately.
>>>
>>>                      However, for IntArgb_Pre pixel format, the
>>> optimized solution is way-ahead and gives better performance.
>>>
>>>            4. In the summary reports, you will find
>>> CompatibleBufferedImage( Translucent ) as well.
>>>
>>>                      In a separate experiment, I found that the pixel
>>> format for compatible buffered image got mapped IntArgb_Pre.
>>>
>>>                      Thus, the performance numbers match with that of
>>> IntArgb_Pre pixel format
>>>
>>> At the present moment, I 'm confused on the solution would be better
>>> to fix the Bug.
>>>
>>> Kindly share your views and thoughts
>>>
>>> Thank you
>>>
>>> Have a good day
>>>
>>> Prahalad N.
>>>
>>> -----Original Message-----
>>> From: Jim Graham
>>> Sent: Thursday, March 31, 2016 1:46 AM
>>> To: Prahalad Kumar Narayanan; 2d-dev at openjdk.java.net
>>> <mailto:2d-dev at openjdk.java.net>; Sergey Bylokhov
>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>>> Antialiased text on translucent backgrounds gets bright artifacts
>>>
>>> Hi Prahalad,
>>>
>>> Benchmarks must run for a significant period of time to be valid.
>>>
>>> Measuring with nanoTime() is fine, but the run times should exceed at
>>> least a couple of seconds, not a few nanoseconds.  You might want to
>>> use Java2DBench instead (in src/demo/share/java2d in the java.desktop
>>> repo), which calibrates test times, does multiple runs, and includes
>>> an analyzer to compare results from multiple test runs...
>>>
>>>                                  ...jim
>>>
>>> On 3/30/2016 4:27 AM, Prahalad Kumar Narayanan wrote:
>>>
>>>> Hello Jim and Everyone on Java2D Group
>>>
>>>>
>>>
>>>> Good day to you.
>>>
>>>>
>>>
>>>> A quick follow-up to Review Request on bug:
>>>
>>>>        Bug          : JDK-8015070
>>>
>>>>        Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
>>>
>>>>
>>>
>>>> Thank you Jim for the detailed feedback.
>>>
>>>>
>>>
>>>> It takes a lot of time not only in performing the review, but also in getting the feedback with clear words.
>>>
>>>> In each review, the solution definitely gets better & better. I 'm
>>>
>>>> happy about it...! :)
>>>
>>>>
>>>
>>>> Quick Inferences from previous feedback:
>>>
>>>>
>>>
>>>> Incorporating the fast path into current logic:
>>>
>>>>         1.  I agree with you on this point and I noticed this when we modelled the solution as per the mask fill.
>>>
>>>>         2.  I ignored it initially for two reasons,
>>>
>>>>                       a.  The statement - if (resA != MaxValFor4ByteArgb)...  follows srcColor pre-multiplication step and this will ensure to skip most of the blending calculations pertaining to destination.
>>>
>>>>                       b.  Any addition / tweaks to current logic, might deviate from the SRCOVER_MASKFILL reference model.
>>>
>>>>                                Many a time, managing code with similar logic across implementation helps a lot.
>>>
>>>>        3.   As you said, including the fast path will avoid few multiplications and if checks too.
>>>
>>>>              The changes are available in the current webrev.
>>>
>>>>                       Link:
>>>
>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
>>>
>>>>
>>>
>>>> Profiling method, and Metrics:
>>>
>>>>        1.  The profiling method that was done yesterday was mere
>>>> execution of the regression test (available in the webrev) and time
>>>> measured with System.currentTimeMillis API
>>>
>>>>                        Since only one iteration was run, the time soared into milli seconds. This could be due to internal glyph caching mechanism.
>>>
>>>>        2.  Today, I evolved the regression test, into a benchmark that does the following:
>>>
>>>>                  a.  Select Font style : {Plain, Bold, Italic, Bold
>>>> Italic}
>>>
>>>>                  b.  Select Font size : {20, 40, 60, 80}
>>>
>>>>                  c.  Trigger drawstring once before benchmark is run. This is to ensure, the font subsystem is done with glyph caching internally.
>>>
>>>>                  d.  Use famous string that has all characters-" The quick brown fox jumps over the lazy dog. 0123456789. "
>>>
>>>>                  e.  For every style-size combination - run the test
>>>> for 20 iterations and take the average. (Note: Font is fixed to
>>>> 'verdana' )
>>>
>>>>                  f.  Modify the precision from System.currentTimeMillis to System.nanoTime() and reduce elapsedTime to micro seconds.
>>>
>>>>
>>>
>>>>        3.  With the above setup in code, my observation on windows is as follows:
>>>
>>>>                  a.  In many cases, The optimized logic is either equal-to (or) better in performance than the un-optimized logic.
>>>
>>>>                               The difference is very minimal - few tens to few hundreds of micro-seconds.
>>>
>>>>                  b.  The optimized algorithm improves performance for Pre-multiplied alpha based destination buffer.
>>>
>>>>                  c.   There are occasional huge deviations where optimized logic seems to take longer time.
>>>
>>>>                               These could be due to external factors
>>>> like- stalls for memory, bus io etc.,
>>>
>>>>                               Since, the deviation is in micro seconds, I believe, it may not be a concern.
>>>
>>>>                  d.  The complete list of time measurement taken up
>>>> on windows x86_64 release build is as-follows-
>>>
>>>>                               I 'm not sure, how the data appears in the final webrev-email.
>>>
>>>>                               Kindly excuse, if the data is un-readable.
>>>
>>>>
>>>
>>>> Platform : Windows x86_64 Release Build Algorithm : Unoptimized.
>>>
>>>> webrev.00
>>>
>>>> ````````````````````````````````````````````````````````````````````
>>>> `
>>>> `
>>>
>>>> ```` Executing Bench For Image Type: IntArgb
>>>
>>>>   -Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742
>>>
>>>>   -Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395
>>>
>>>>   -Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474
>>>
>>>>   -Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079
>>>
>>>>   -Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380
>>>
>>>>   -Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301
>>>
>>>>   -Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631
>>>
>>>>   -Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812
>>>
>>>>   -Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059
>>>
>>>>   -Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180
>>>
>>>>   -Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558
>>>
>>>>   -Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
>>>> 426.988
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
>>>> 374.064
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
>>>> 732.375
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
>>>> 864.68
>>>
>>>>
>>>
>>>> Executing Bench For Image Type: IntArgb_Pre
>>>
>>>>   -Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768
>>>
>>>>   -Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299
>>>
>>>>   -Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941
>>>
>>>>   -Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372
>>>
>>>>   -Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096
>>>
>>>>   -Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589
>>>
>>>>   -Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972
>>>
>>>>   -Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894
>>>
>>>>   -Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028
>>>
>>>>   -Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300
>>>
>>>>   -Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379
>>>
>>>>   -Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
>>>> 93.243
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
>>>> 475.406
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
>>>> 280.085
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
>>>> 357.486
>>>
>>>>
>>>
>>>> Platform : Windows x86_64 Release Build Algorithm : Optimized.
>>>
>>>> webrev.03
>>>
>>>> ````````````````````````````````````````````````````````````````````
>>>> `
>>>> `
>>>
>>>> ```` Executing Bench For Image Type: IntArgb
>>>
>>>>   -Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954
>>>
>>>>   -Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871
>>>
>>>>   -Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799
>>>
>>>>   -Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390
>>>
>>>>   -Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566
>>>
>>>>   -Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054
>>>
>>>>   -Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735
>>>
>>>>   -Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559
>>>
>>>>   -Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289
>>>
>>>>   -Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750
>>>
>>>>   -Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181
>>>
>>>>   -Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
>>>> 375.336
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
>>>> 571.371
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
>>>> 548.300
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
>>>> 714.526
>>>
>>>>
>>>
>>>> Executing Bench For Image Type: IntArgb_Pre
>>>
>>>>   -Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026
>>>
>>>>   -Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016
>>>
>>>>   -Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617
>>>
>>>>   -Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829
>>>
>>>>   -Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809
>>>
>>>>   -Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563
>>>
>>>>   -Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049
>>>
>>>>   -Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802
>>>
>>>>   -Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320
>>>
>>>>   -Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473
>>>
>>>>   -Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488
>>>
>>>>   -Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
>>>> 54.687
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
>>>> 235.505
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
>>>> 227.205
>>>
>>>>   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
>>>> 324.308
>>>
>>>>
>>>
>>>> Updated webrev with changes for the fast-path :
>>>
>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
>>>
>>>> Kindly review and provide your suggestions.
>>>
>>>>
>>>
>>>> Thank you once again for detailed review and feedback Have a good
>>>> day
>>>
>>>>
>>>
>>>> Prahalad N.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> -----Original Message-----
>>>
>>>> From: Jim Graham
>>>
>>>> Sent: Wednesday, March 30, 2016 2:46 AM
>>>
>>>> To: Prahalad Kumar Narayanan;2d-dev at openjdk.java.net
>>>> <mailto:2d-dev at openjdk.java.net>; Sergey Bylokhov
>>>
>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>>>
>>>> Antialiased text on translucent backgrounds gets bright artifacts
>>>
>>>>
>>>
>>>> Hi Prahalad,
>>>
>>>>
>>>
>>>> This latest version looks like it should produce correct answers.
>>>
>>>>
>>>
>>>> I'd like to see benchmark results on more platforms, particularly Windows since the different compilers may optimize these things differently.
>>>
>>>>
>>>
>>>> Also, you didn't mention what data set you used for benchmarking.
>>>> I'd
>>>
>>>> like to see benchmark results for small, medium and large font
>>>> sizes,
>>>
>>>> and possibly bold and italic fonts as well.  The reason is that the
>>>
>>>> relative ratios of "empty glyph pixels" to "partial glyph pixels" to
>>>
>>>> "fully covered glyph pixels" changes depending on the font type and
>>>
>>>> size so if you made one of those faster and another slower then the
>>>
>>>> results may be seen as a gain in one type of test if you only test
>>>> one
>>>
>>>> font type and size and it happens to match the part of the code that
>>>
>>>> is more streamlined.  Also, for small font sizes the per-glyph
>>>
>>>> overhead might hide per-pixel issues.  Please share which fonts and
>>>
>>>> sizes you used for testing and consider using some different sizes,
>>>
>>>> including something very large like 36 or 48 points (something with
>>>
>>>> large areas of opaque
>>>
>>>> pixels) as well as more normal sizes that appear in GUIs.  Also, bold fonts can have a higher percentage of opaque pixels.
>>>
>>>>
>>>
>>>> In particular...
>>>
>>>>
>>>
>>>> This latest version is missing the "fast path" from the existing code for the case of srcA == 255 and glyphA == 255 where it just stores the FG_PIXEL values directly.  For large fonts/glyphs that case may be as important as detecting empty glyph pixels.
>>>
>>>>
>>>
>>>> On the other hand, an additional "if" may cause the compiler to generate less efficient code as per Sergey's concern.  Since this "if" eliminates some multiplies and possibly some divides, I'd hope it would be a win-win.
>>>
>>>>
>>>
>>>> You could add the fast path back inside the case where mixValSrc is 255 and just test srcA for 255 and do the Store ## DST ## PixelData() macro that used to be at the end of the block in the existing code, and then use "break;" to escape out of the do/while surrounding the whole macro so it moves on to the next pixel.
>>>
>>>>
>>>
>>>> (Arguably, we might want to consider teaching our SRCOVER_MASKFILL
>>>> to
>>>
>>>> do the same thing.  I think that was one of the added values of
>>>> having
>>>
>>>> a separate GLYPH loop, but really both should be optimizing that
>>>
>>>> case...)
>>>
>>>>
>>>
>>>> I can see now that the macro switch to use the same macro set as SRCOVER_MASKFILL required you to compute the pixel address, as you noted in your summary.  It makes the new macro more cumbersome and makes it look like it's doing a bit more work per-pixel, but in reality I think the overall operations end up being the same as long as the compiler optimizes the deliberate multiplications the same way it did for implicit multiplications in the "pRas[foo]" and "pRas[foo*4]" code that was being generated previously.  Benchmarks will tell us if we're good there...
>>>
>>>>
>>>
>>>>                                             ...jim
>>>
>>>>
>>>
>>>>
>>>
>>>> On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote:
>>>
>>>>> Hello Everyone on Java2D Group
>>>
>>>>>
>>>
>>>>> Good day to you.
>>>
>>>>>
>>>
>>>>> This is a follow-up to Review Request on bug:
>>>
>>>>>       Bug          : JDK-8015070
>>>
>>>>>       Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
>>>
>>>>>
>>>
>>>>> First, Thanks to Jim and Sergey for their feedback on the changes so far.
>>>
>>>>>
>>>
>>>>> Inferences from Jim 's Feedback on Loading destination colors:
>>>
>>>>>       1.  The API or Macro that I had earlier used to Load DST colors, indeed, adjusted destination color components with divide-by-alpha if destination was already pre-multiplied.
>>>
>>>>>                    My apologies.. I should have spotted this error at the first iteration itself.
>>>
>>>>>       2.  Due to the divide-by-alpha adjustment, the remaining
>>>>> logic would become incorrect. ( Especially, the multiplication with
>>>>> dstF based on pre-mulitplication status )
>>>
>>>>>       3.  Correct API is being used now, and the dstColor components are loaded directly without any divide-by-alpha adjustment.
>>>
>>>>>
>>>
>>>>> Inferences from Sergey's Feedback on Performance.
>>>
>>>>>       1.  To set the context for everyone, the logic present in the current webrev.02 is modelled as per SRCOVER_MASKFILL.
>>>
>>>>>                    There are multiple if (...) conditions that remove un-necessary blending calculations. Ideally this should improve performance.
>>>
>>>>>                    However, since some data are not readily available (as present in SRCOVER_MASKFILL), few additional calculations have been added.
>>>
>>>>>                    They are: pre-multiplying srcColor with alpha and assigning to res.
>>>
>>>>>                                        Finding the correct address of Pixel using DST_PTR and PixelStride.
>>>
>>>>>                    Henceforth, as Sergey suggests, Observation on performance will be beneficial.
>>>
>>>>>       2.  The performance of the new logic was measured with
>>>>> linux-x86_64-normal-server-release config and compared with the
>>>>> logic used in un-optimized code in webrev.00
>>>
>>>>>       3.  Result: The newer logic provides a fractional gain (about 15 - 20 ms) over the older logic.
>>>
>>>>>
>>>
>>>>> Other Subtle Changes:
>>>
>>>>>       1.  The test file has been renamed from
>>>>> AADrawStringArtifact.java to more meaningful -
>>>>> AntialiasedTextArtifact.java
>>>
>>>>>       2.  The test file tests for both TYPE_INT_ARGB and TYPE_INT_ARGB_PRE BufferedImage formats.
>>>
>>>>>                   The code has been well commented to explain the logic used in every function.
>>>
>>>>>
>>>
>>>>> Kindly take your time to review the changes in the webrev link mentioned below and provide your suggestions.
>>>
>>>>>            Webrev Link:
>>>
>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/
>>>
>>>>>
>>>
>>>>>
>>>
>>>>> Thank you for your time in review
>>>
>>>>> Have a good day
>>>
>>>>>
>>>
>>>>> Prahalad N.
>>>
>>>>>
>>>
>>>>>
>>>
>>>>> -----Original Message-----
>>>
>>>>> From: Jim Graham
>>>
>>>>> Sent: Thursday, March 24, 2016 7:57 AM
>>>
>>>>> To: Prahalad Kumar Narayanan;2d-dev at openjdk.java.net
>>>>> <mailto:2d-dev at openjdk.java.net>
>>>
>>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>>>
>>>>> Antialiased text on translucent backgrounds gets bright artifacts
>>>
>>>>>
>>>
>>>>> Hi Prahalad,
>>>
>>>>>
>>>
>>>>> (On a side note - ouch!  I came up with these macros in the first
>>>
>>>>> place, but 20 years later I'm now realizing just how hard they are
>>>>> to
>>>
>>>>> navigate and review.  My apologies...)
>>>
>>>>>
>>>
>>>>> The macro you are still using to load the destination data is one that loads it to non-premultiplied components, which means it will divide out the alpha if the destination is PRE.  The rest of the logic assumes that the components were loaded without any adjustment of their premultiplication so not only is that division an unnecessary operation, it makes the following math wrong.
>>>
>>>>>
>>>
>>>>> The SRCOVER_MASKFILL macro seems to use "Postload ## STRATEGY ## From ## TYPE" which seems to load them into separate components without any adjustment of their pre-multiplication status.  This is paired with "LoadAlpha>From ## TYPE ## For ## STRATEGY" to load just the destination alpha for computing dstF...
>>>
>>>>>
>>>
>>>>>                                           ...jim
>>>
>>>>>
>>>
>>>>> On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote:
>>>
>>>>>> Hello Everyone on Java2D Group
>>>
>>>>>>
>>>
>>>>>> Good day to you.
>>>
>>>>>>
>>>
>>>>>> This is a Follow-up to Review Request on the bug:
>>>
>>>>>>        Bug          : JDK-8015070   Anti-aliased Text on Translucent background gets bright artifacts
>>>
>>>>>>        Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
>>>
>>>>>>
>>>
>>>>>> First, Sincere thanks to Jim for his valuable feedback.
>>>
>>>>>>        1.  As Jim correctly mentioned, SRCOVER_MASKFILL has a similar logic to blend two Translucent colors based on an Alpha mask.
>>>
>>>>>>        2.  The calculations are exactly the same as the changes in previous webrev.
>>>
>>>>>>                    However the logic of SRCOVER_MASKFILL is 'Optimized' to reduce the number of computations.
>>>
>>>>>>        3.  This optimization is definitely required because, the logic is executed for every single pixel in a glyph.
>>>
>>>>>>                    Example: If a string is made up of 5 English
>>>>>> characters with each character having 32 x 32 pixels,
>>>
>>>>>>                                       The anti-aliasing logic will be executed for 5 x 32 x 32 iterations.
>>>
>>>>>>                    Reducing computation per pixel will imply a huge benefit for complete drawString operation.
>>>
>>>>>>
>>>
>>>>>> Observation from SRCOVER_MASKFILL
>>>
>>>>>>        1.  The mask fill reduces computations by through multiple if(...) conditions.
>>>
>>>>>>                      Each if condition affirms whether the next set of computations are required.
>>>
>>>>>>        2.  Optimization 1: If mask value is 0- skip entire logic.
>>>
>>>>>>        3.  Optimization 2: If mask value is maximum, say 255, take
>>>>>> srcA directly without multiplying with maskAlpha ( Reason: 1 *
>>>>>> srcAlpha = srcAlpha )
>>>
>>>>>>        4.  Optimization 3: Compute pre-multiplied resColor in two steps. First with srcColor and then with dstColor.
>>>
>>>>>>                      If the resAlpha from 1st step (i.e., srcColor) is fully opaque, avoid blending dstColor altogether.
>>>
>>>>>>
>>>
>>>>>> Changes in Current Webrev.01
>>>
>>>>>>        1.  The fix for the current bug is modelled based on SRCOVER_MASKFILL.
>>>
>>>>>>        2.  The changes have been verified to work on windows, linux and mac operating systems.
>>>
>>>>>>        3.  The automated Test file- AADrawStringArtifact.java runs
>>>>>> as expected
>>>
>>>>>>                   Identifies artifact & throws exception when run on JDK 7 and 8.
>>>
>>>>>>                   With JDK9, the test file returns without error.
>>>
>>>>>>        3.  JPRT build has been run to ensure, changes build on all supported platforms.
>>>
>>>>>>                    JPRT job link :
>>>
>>>>>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.p
>>>>>> ra
>>> <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-22-070604.pra>
>>>
>>>>>> h
>>>
>>>>>> n
>>>
>>>>>> ara-linux.client
>>>
>>>>>>
>>>
>>>>>> Kindly review the changes in the below mentioned link and provide
>>>>>> your views
>>>
>>>>>>        Webrev Link :
>>>
>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/
>>>
>>>>>>
>>>
>>>>>>
>>>
>>>>>> Thank you for your time in review
>>>
>>>>>> Have a good day
>>>
>>>>>>
>>>
>>>>>> Prahalad N.
>>>
>>>>>>
>>>
>>>>>>
>>>
>>>>>> -----Original Message-----
>>>
>>>>>> From: Jim Graham
>>>
>>>>>> Sent: Friday, March 18, 2016 6:07 AM
>>>
>>>>>> To: Prahalad Kumar Narayanan;2d-dev at openjdk.java.net
>>>>>> <mailto:2d-dev at openjdk.java.net>
>>>
>>>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>>>
>>>>>> Antialiased text on translucent backgrounds gets bright artifacts
>>>
>>>>>>
>>>
>>>>>> Hi Prahalad,
>>>
>>>>>>
>>>
>>>>>> This basically boils down to "alpha blending math needs to be performed in premultiplied form" and the existing code was not doing that.
>>>
>>>>>>
>>>
>>>>>> Your changes do add a pre-multiplication step to the math in two
>>>
>>>>>> places
>>>
>>>>>> - when you mix the src alpha and the glyph alpha at the top of the macro, and again when you do the Multiply(dst, dstA, dst) step in the middle.  In that sense, the new math is correct.
>>>
>>>>>>
>>>
>>>>>> However, it is not the optimal way to implement this.  In
>>>
>>>>>> particular, the macro used here to load the data from the
>>>
>>>>>> destination is the one that loads it into 4 ARGB non-premultiplied
>>>
>>>>>> values.  If the destination is non-PRE, then your added multiply
>>>
>>>>>> step is exactly what is needed, but it could be combined with the
>>>
>>>>>> multiply that will happen later in the blending equation, so it is
>>>
>>>>>> an added step rather than a fixed fraction in the existing MultMultAdd parameters.
>>>
>>>>>> Additionally, if the destination is already PRE, then the macro
>>>
>>>>>> being used to load the dst pixel data there will have done a
>>>>>> divide
>>>
>>>>>> step to divide out the alpha from the destination, only to have
>>>>>> you
>>>
>>>>>> reverse that math with your new
>>>
>>>>>> Multiply() step.  That's a lot of math to end up with a NOP.
>>>
>>>>>>
>>>
>>>>>> The MUL8 you added for the srcA and glyph value is needed, that change is good.  Since it is common for glyph pixels to have zero alpha, you might want to test the glyph alpha for 0 and skip the pixel before you do the multiply, though.  This would add one more if, but it would be a common case.
>>>
>>>>>>
>>>
>>>>>> The trickier part is to load the destination components without un-premultiplying them.  Unfortunately there is no "Load...ToArgbPre"
>>>
>>>>>> macro to parallel the Load macro used in the function.  Perhaps there should be, but you'd still end up with an extra multiply step as I mentioned above because you can fold the premultiplication of the dst data into the MultMultAdd by carefully choosing the parameters you use in the math there.
>>>
>>>>>>
>>>
>>>>>> The good news is that the way that the SRCOVER_MASKFILL uses the various type-specific macros works around this a bit and minimizes the number of multiplies that happen.  You could check out DEFINE_SRCOVER_MASKFILL and see how it works in the case where pMask is not null (pMask is an alpha mask with values very similar to the glyphAA data).  Modeling this code on that code would correct the math and minimize it as well...
>>>
>>>>>>
>>>
>>>>>>                                        ...jim
>>>
>>>>>>
>>>
>>>>>> On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote:
>>>
>>>>>>> Hello Everyone on Java2D Group
>>>
>>>>>>>
>>>
>>>>>>> Good day to you.
>>>
>>>>>>>
>>>
>>>>>>> Herewith, I 'm sharing the webrev for two identical Java2D Bugs.
>>>
>>>>>>>
>>>
>>>>>>> Bug ID : JDK-8015070
>>>
>>>>>>>
>>>
>>>>>>>            Title     : Antialiased text on translucent backgrounds gets
>>>
>>>>>>> bright artifacts
>>>
>>>>>>>
>>>
>>>>>>>            Link      :https://bugs.openjdk.java.net/browse/JDK-8015070
>>>
>>>>>>>
>>>
>>>>>>> Bug ID : JDK-8013564 ( currently closed as duplicate )
>>>
>>>>>>>
>>>
>>>>>>>            Title     : Font rendering anti-aliasing in translucent
>>>
>>>>>>> BufferedImages broken
>>>
>>>>>>>
>>>
>>>>>>>            Link      :https://bugs.openjdk.java.net/browse/JDK-8013564
>>>
>>>>>>>
>>>
>>>>>>> Webrev Link :
>>>
>>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
>>>
>>>>>>>
>>>
>>>>>>> Quick Summary on Bugs :
>>>
>>>>>>>
>>>
>>>>>>> ````````````````````````````````````````````````
>>>
>>>>>>>
>>>
>>>>>>> 1.  Artifacts appear on Edges of text characters when
>>>>>>> anti-aliased
>>>
>>>>>>> text is drawn on Translucent background
>>>
>>>>>>>
>>>
>>>>>>> 2.  The issue is reproducible on all platforms - windows, linux and mac os.
>>>
>>>>>>>
>>>
>>>>>>> 3.  Besides, the issue is reproduced with the commonly used pixel
>>>
>>>>>>> format- 4ByteArgb.
>>>
>>>>>>>
>>>
>>>>>>> Root Cause & Solution :
>>>
>>>>>>>
>>>
>>>>>>> ````````````````````````````````````````````````
>>>
>>>>>>>
>>>
>>>>>>> 1.  The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h
>>>>>>> uses
>>>
>>>>>>> the standard blending algorithm
>>>
>>>>>>>
>>>
>>>>>>>             dstColor = [ srcColor * glyphAlpha + dstColor * (1 -
>>>
>>>>>>> glyphAlpha) ] / dstAlpha
>>>
>>>>>>>
>>>
>>>>>>> 2.  The above equation works only when the srcColor and dstColor are Opaque.
>>>
>>>>>>>
>>>
>>>>>>> 3.  When srcColor and dstColor are Translucent, their
>>>
>>>>>>> alphaComponent will influence the visibility of the color, and
>>>
>>>>>>> visibility of the color below.
>>>
>>>>>>>
>>>
>>>>>>> 4.  The new set of calculations for blending Translucent source
>>>>>>> and
>>>
>>>>>>> destination colors is given as
>>>
>>>>>>>
>>>
>>>>>>>             resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha))
>>>
>>>>>>>
>>>
>>>>>>>             resColor = [ srcColor * srcAlpha + dstColor *
>>>>>>> dstAlpha
>>>
>>>>>>> *
>>>
>>>>>>> (1
>>>
>>>>>>> -
>>>
>>>>>>> srcAlpha) ] / resAlpha
>>>
>>>>>>>
>>>
>>>>>>> 5.  Reference text for the equation:
>>>
>>>>>>> https://en.wikipedia.org/wiki/Alpha_compositing
>>>
>>>>>>>
>>>
>>>>>>> 6.  With the above modification to the blending logic, the
>>>
>>>>>>> artifacts do not appear and issues are fixed.
>>>
>>>>>>>
>>>
>>>>>>> Jtreg & Jprt Results :
>>>
>>>>>>>
>>>
>>>>>>> ````````````````````````````````````````````````
>>>
>>>>>>>
>>>
>>>>>>> 1.  A simple test-file: AADrawStringArtifact.java has been
>>>>>>> created
>>>
>>>>>>> to be a part of jtreg test cases.
>>>
>>>>>>>
>>>
>>>>>>>              The test file is well commented to explain - nature
>>>>>>> of
>>>
>>>>>>> artifact and how the test tries to identify them.
>>>
>>>>>>>
>>>
>>>>>>>              As required, the test case fails with Jdk 7, Jdk 8
>>>>>>> and
>>>
>>>>>>> succeeds with Jdk 9-internal (built with changes for the bug fix)
>>>
>>>>>>>
>>>
>>>>>>> 2.  The changes to blending logic lies within
>>>
>>>>>>> java.desktop/src/share/native/...
>>>
>>>>>>>
>>>
>>>>>>>              Henceforth, JPRT was used to ensure successful build
>>>
>>>>>>> across all supported platforms
>>>
>>>>>>>
>>>
>>>>>>>              Jprt Job Link :
>>>
>>>>>>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001.
>>>>>>> pr
>>> <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-17-072001.pr>
>>>
>>>>>>> a
>>>
>>>>>>> h
>>>
>>>>>>> n
>>>
>>>>>>> ara-linux.client
>>>
>>>>>>>
>>>
>>>>>>>              The build succeeds on all platforms.
>>>
>>>>>>>
>>>
>>>>>>> Kindly review the webrev link and provide your views and suggestions.
>>>
>>>>>>>
>>>
>>>>>>> Webrev Link :
>>>
>>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
>>>
>>>>>>>
>>>
>>>>>>> If the changes look good, we could take up the changes for source checkin.
>>>
>>>>>>>
>>>
>>>>>>> Thank you for your time in review
>>>
>>>>>>>
>>>
>>>>>>> Have a good day
>>>
>>>>>>>
>>>
>>>>>>> Prahalad N.
>>>
>>>>>>>
>>>