[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Mon Oct 25 19:47:02 UTC 2010

Hi Denis,

On 10/25/2010 7:34 AM, Denis Lila wrote:
>> (and I have some ideas on further optimizations to consider if you are still
>> game after this goes in)...
>
> I'd love to hear what they are.

Here are my thoughts:

- Currently Renderer has more stages than we probably should have:
     for (each full pixel row) {
         for (each subpixel row) {
             for (each curve on subpixel row) {
                 convert curve into crossing
                     crossing is (subpixelx:31 + dir:1)
             }
             sort crossings for subpixel row
             apply wind rule and convert crossings into alpha deltas
         }
         convert alpha deltas into run lengths
     }
     for (each tile) {
         convert run lengths into alphas
     }
I'm thinking we should be able to get rid of a couple of those stages...

- One alternate process would be:
     (all work is done in the tail end of the cache itself)
     for (each full pixel row) {
         for (each curve on full pixel row) {
             convert curve into N subpixel crossings
                 (subpixelx:28 + subpixelfracty:3 + dir:1)
         }
     }
     sort all crossings for entire pixel row
     maybe collapse raw crossings into modified accumulated crossings
     record start of row in index array
     for (each tile) {
         convert raw or collapsed crossings directly into alphas
     }
Note that we could simply leave the raw crossings in the cache and then 
process them in the tile generator, but that would require more storage 
in the cache since a given path would tend to have 8 different entries 
as it crossed every subpixel row.  If we collapse them, then we can sum 
the entries for a given x coordinate into a single entry and store:
     (pixelx:25 + coveragedelta:7)
where the coverage delta is a signed quantity up to N_SUBPIX^2 so it 
uses the same storage we needed for the subpixel parts of both X and Y 
plus the direction bit.  It likely needs a paired value in the next x 
pixel location just like our current "alpha deltas" needs as well.  If 
we wanted to optimize that then we could devote one more bit to "the 
next pixel will consume all of the remainder of the N^2 coverage", but 
there would be cases where that would not be true (such as if the pixel 
row has only partial vertical coverage by the path).  It's probably 
simpler to just have deltas for every pixel.

The storage here would likely be similar to the storage used for the 
current cache since the current RLE cache uses 2 full ints to store a 
coverage and a count.  And in cases where we have one pixel taking up 
partial coverage and the following pixel taking up the remainder of the 
full coverage then we have 4 ints, but the "crossing delta" system would 
only have 2 ints.

Other thoughts...

- Create a curve class and store an array of those so you don't have to 
iterate 3 different arrays of values and use array accesses to grab the 
data (each array access is checked for index OOB exceptions).

- Or perform AFD on curves as they come into Renderer, but only save 
line segment edges in the edges array.  This would use more storage, but 
the iterations of the AFD would happen in a tight loop as the data comes 
in rather than having to store all of their AFD data back in the quads 
and curves arrays and then reload the data for every sub-pixel step. 
Renderer still takes curves, it just breaks them down immediately rather 
than on the fly.  If there are only a small number of edges per curve 
then the storage might not be that much worse because the quad and curve 
arrays already store more values than the edge array.

- Convert to native.  Note that we use a native version of the pisces 
that you started with to do some rendering in FX.  I tried porting to 
use your new (Java) renderer in FX and performance went down even though 
you show it to be faster than what was there before.  So, your Java 
renderer compares favorably to the old Java pisces, but both compare 
unfavorably to the old native pisces.  Maybe we should convert your code 
to native and see if that gives us a performance boost.  It's nice to 
use pure Java, but there is a lot of "shoehorning" of data going on here 
that could be done much more easily and naturally in native code.

How is that for "food for thought"?

			...jim