[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

Wed May 1 00:43:18 UTC 2013

Hi Laurent,

There's a lot here to go over... ;)

On 4/30/13 9:29 AM, Laurent Bourgès wrote:
> Jim,
>
> here is the latest webrev:
> http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-3/

I'll answer questions first and then have a look...

> I have few questions regarding renderer edge handling and float ceil calls:

I have a personal philosophy for all rendering engines that I've 
produced, but I'm not sure the philosophy was adopted for this code.  I 
like to use the same rounding for every operation and have all ranges be 
specified as semi-open regions - x1 <= x < x2 - for example.

The "insideness" rule is most naturally expressed by semi-open intervals 
since pixel centers that fall on a vertical boundary are inside if the 
interior region is to their right (i.e. the x1 <= x half of that 
equation above) and they are outside if the space to their right is 
non-interior (i.e. the x < x2 half of that equation).  A similar rule is 
also used vertically.

The rounding operation which computes this would be:

xc = computed crossing value for the current scan line;
ceil(xc - 0.5);

This maps (for some integer k):

k+0.4999  to  k
k+0.5000  to  k
k+0.5001  to  k+1

Which is exactly what you want for the beginning of a region.  If the 
crossing is exactly at the pixel center then that is the furthest right 
that you can cross the scan line and still include pixel k.  If you just 
miss the center of a pixel to the right, then the first pixel to be 
included is k+1.  That function computes it correctly.

Similarly, if you apply the same formula to the right edge of a region 
then you compute "the first pixel to not be included in the region. 
Consider that if you miss a pixel center on that right edge by falling 
slightly to the left of it, then that pixel is "outside" and it is the 
first such pixel to be "outside".  If you hit its pixel center exactly, 
then the insideness rule also says that it is "outside".  Only when you 
just miss it by falling slightly to the right of its center would that 
pixel still be included.  The above formula computes that value if you 
take its answer to be the "first pixel not included in the region".

This lets you perform the same rounding on every value and treat every 
pair of "inside trigger" to "outside trigger" as a half-open interval. 
Note that otherwise you'd have to compute every floating point crossing, 
then run the EvenOdd vs. ZeroWinding rule on them, and finally only do 
the rounding after you knew which were left edges and which were right 
(which you don't know until you've computed every crossing on the scan 
line).  Since this rounding rule is symmetric given half-open intervals 
then you can round everything before you've analyzed the results of the 
winding rules.

I then extend that philosophy to every set of coordinates I deal with, 
be it horizontal ranges, vertical ranges, shape rasterized results, or 
clip results, and the math usually works out in the simplest and most 
elegant way possible.

For an AA renderer, it isn't clear that the sub-pixel computations 
should be as sub-sub-pixely accurate, but I'd like to think that the 
philosophy should be kept down to the lowest layers if we can, 
especially as this code can be tuned down to a single sub-pixel per real 
pixel and then it definitely should be as accurate as it can with the 
insideness rule (even though I doubt it will ever be set up that way).

Another consideration is the vertices on an enclosing polygon.  You 
compute one segment to where it ends at xe1,ye1 and start computing the 
next segment from where it starts at xs2,ys2, but for adjacent segments 
in an outline, they start and end on a shared common point so xe1==xs2 
and ye1==ys2.  If you compute crossings for both segments at that shared 
point then you have a chance of recording the crossings twice for that 
tiny chunk of the outline.  You could special case the last point in a 
segment, but that gets into lots of tricky tests.  On the other hand, if 
all intervals everywhere are half-open then anything you compute for 
xe1,ye1 would represent "the first line/pixel that segment 1 does not 
cause to be included" and computing the same values for xs2,ys2 would 
naturally generate "the first line/pixel that segment 2 causes to be 
included" and only one of them would induce inclusion on that scan line 
or pixel.  This gets even more valuable when you consider all cases of 
segment 1 going down to xe1,ye1 and segment 2 going back up from that 
point, or up/down or up/up or down/down - and also left/left or 
left/right or right/right or right/left.  All possible cases of how one 
segment arrives at xe1,ye1 and the next segment leaves from that same 
point just naturally fall out from using symmetric rounding and 
half-open intervals.

I wanted to get that out to see if we were on the same page or if there 
was another way of doing the math that you are familiar with that might 
also make things easier.

I'm pretty sure that most of the code in the current version of Pisces 
uses that same philosophy, but I seem to recall that there were a couple 
of places that Denis was more comfortable with fully closed intervals. 
I don't remember the details, but be on the watch for it.  Also, I don't 
recall if it used a half-sub-pixel bias anywhere or simply punted on it 
as not being visible enough to worry about strict "center of sub-pixel" 
comparisons.

Now on to the specific questions:

>      private void addLine(float x1, float y1, float x2, float y2) {
> ...
>          // LBO: why ceil ie upper integer ?
>          final int firstCrossing = Math.max(ceil(y1), boundsMinY); //
> upper integer
>          final int lastCrossing  = Math.min(ceil(y2), boundsMaxY); //
> upper integer

Perhaps lastCrossing is misnamed.  I think it represents the end of the 
interval which is half-open.  Does that match the rest of the operations 
done on it?

If every coordinate has already been biased by the -0.5 then ceil is 
just the tail end of the rounding equation I gave above.

If not, then the half-open considerations still apply if you use the 
same rounding for both the top and the bottom of the segment.  If 
lastCrossing represents the "bottom of the shape" then it should not be 
included.  If the next segment continues on from it, then its y1 will be 
computed in the first equation and will represent the first scanline 
that the next segment participates in.

> => firstCrossing / lastCrossing should use lower and upper integer
> values (rounding issues) ?

Do they make sense now given my explanation above?  Perhaps they should 
be renamed to indicate their half-openness?

>      boolean endRendering() {
>          // TODO: perform shape clipping to avoid dealing with segments
> out of bounding box
>
>          // Ensure shape edges are within bbox:
>          if (edgeMinX > edgeMaxX || edgeMaxX < 0f) {
>              return false; // undefined X bounds or negative Xmax
>          }
>          if (edgeMinY > edgeMaxY || edgeMaxY < 0f) {
>              return false; // undefined Y bounds or negative Ymax
>          }

I'd use min >= max since if min==max then I think nothing gets generated 
as a result of all edges having both the in and out crossings on the 
same coordinate.  Also, why not test against the clip bounds instead? 
The code after that will clip the edgeMinMaxXY values against the 
boundsMinMax values.  If you do this kind of test after that clipping is 
done (on spminmaxxy) then you can discover if all of the coordinates are 
outside the clip or the region of interest.

>          // why use upper integer for edge min values ?

Because an edge falling between sample points includes only the sample 
point "above" it.  (Or "un-includes" it if it is a right/bottom edge.)

> => Here is the question: why use ceil instead of floor ?
>
>          final int eMinX = ceil(edgeMinX); // upper positive int
>          if (eMinX > boundsMaxX) {
>              return false; // Xmin > bbox maxX
>          }
>
>          final int eMinY = ceil(edgeMinY); // upper positive int
>          if (eMinY > boundsMaxY) {
>              return false; // Ymin > bbox maxY
>          }
>
>          int spminX = Math.max(eMinX, boundsMinX);
>          int spmaxX = Math.min(ceil(edgeMaxX), boundsMaxX); // upper
> positive int
>          int spminY = Math.max(eMinY, boundsMinY);
>          int spmaxY = Math.min(ceil(edgeMaxY), boundsMaxY); // upper
> positive int
>
>          int pminX = spminX >> SUBPIXEL_LG_POSITIONS_X;
>          int pmaxX = (spmaxX + SUBPIXEL_MASK_X) >> SUBPIXEL_LG_POSITIONS_X;
>          int pminY = spminY >> SUBPIXEL_LG_POSITIONS_Y;
>          int pmaxY = (spmaxY + SUBPIXEL_MASK_Y) >> SUBPIXEL_LG_POSITIONS_Y;
>
>          // store BBox to answer ptg.getBBox():
>          this.cache.init(pminX, pminY, pmaxX, pmaxY);
>
> In PiscesCache:
>      void init(int minx, int miny, int maxx, int maxy) {
> ...
>          // LBO: why add +1 ??
>          bboxX1 = maxx /* + 1 */;
>          bboxY1 = maxy /* + 1 */;

I think the values passed in are "the smallest and largest values we 
encountered" and so adding one computes the first coordinate that is 
"not inside the sample region" as per the half-open philosophy.

I tend to use and encourage "min/max" naming for closed-interval values 
and xy0/xy1 or xy1/xy2 for half-open-interval values.

> => piscesCache previously added +1 to maximum (upper loop condition y < y1)
> but the pmaxX already uses ceil (+1) and (spmaxY + SUBPIXEL_MASK_Y)
> ensures the last pixel is reached.

spmaxY + SUBPIXEL_MASK_Y computes the last sub-pixel piece of the last 
pixel.  It is essentially equivalen to "lastrow.9999999".  So, the first 
coordinate not included would be "lastrow+1" which is simply adding +1 
to the sub-pixel "max" value that was given.

> I fixed these limits to have correct rendering due to dirty rowAARLE reuse.

If there was a bug before I'd prefer to have it fixed within the 
philosophy of half-open intervals rather than trying to translate into 
closed intervals - it keeps all of the code on the same page.

> 2013/4/24 Jim Graham <james.graham at oracle.com <mailto:james.graham at oracle.com>>
> Do you have any faster implementation for Math.ceil/floor ? I now use
> casting 1 + (int) / (int) but I know it is only correct for positive
> numbers (> 0).

ceil(integer) == integer, but your technique would return "integer+1". 
I don't remember a decent technique for doing ceil with a cast.

> - clipping issues:
> for very large dashed rectangle, millions of segments are emited from
> dasher (x1) to stroker (x4 segments) to renderer (x4 segments).
>
> However, I would like to add a minimal clipping algorithm but the
> rendering pipeline seems to be handling affine transforms between stages:
>          /*
>           * Pipeline seems to be:
>           *    shape.getPathIterator
>           * -> inverseDeltaTransformConsumer
>           * -> Dasher
>           * -> Stroker
>           * -> deltaTransformConsumer OR transformConsumer
>           * -> pc2d = Renderer (bounding box)
>           */

If we could teach the stroker to be able to apply an elliptical pen then 
we could do the transforms once up front and simplify that quite a lot. 
  That optimization remains TBD.  We currently at least detect uniform 
scales and those can be handled with a single transform up front and 
multiplying all dash segment values and the line width by the uniform 
scale factor.  But, anything that transforms a circle into an ellipse 
requires us to use the multi-stage silliness above.

> It is complicated to perform clipping in the Renderer and maybe to late
> as a complete stroker's segment must be considered as a whole (4
> segments without caps ...).
>
> Could you give me advices how to hack / add clipping algorithm in the
> rendering pipeline (dasher / stroker / renderer) ?

> Should I try to derive a bounding box for the dasher / stroker depending
> on the Affine transforms ?

Yes, that sounds right so far, but it's pretty high-level and I'm not 
sure where your level of understanding here falls short...?  I guess 
what I would do is:

- compute an expanded "clip box" for the stroker based on the original 
clip, the transform, the line width, and the miter limit.  Note that the 
transform applied to the stroke coordinates varies depending on whether 
we use the "uniform scaling" optimization or not

- test coordinates for being in the expanded clip box in the 
PathConsumer2D methods of Stroker.  If the segment is rejected, be sure 
to update all of the internal computations (normal values, last moveto 
coords, etc.) that might be needed for stroking the following segment

- unfortunately this means that dasher doesn't get the advantages, you 
could add similar code to the dasher, but then you have to be sure an 
entire dash is omitted before you can omit the data going on, otherwise 
the stroker will not get proper incoming coordinates from "the previous 
segment that was out of bounds".  That could be managed with proper 
communication between the dasher and stroker, but then it gets more 
complicated.

Those are the caveats that I would keep in mind while I was designing a 
solution - hope it helps some...

			...jim