[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

Thu May 2 08:48:25 UTC 2013

Jim,

thanks for this long explanations and I agree you approach using semi-open
intervals.

I have the feeling that pisces was not so accurate implementing it.

My comments in the text:

 I have few questions regarding renderer edge handling and float ceil calls:
>>
>
> I have a personal philosophy for all rendering engines that I've produced,
> but I'm not sure the philosophy was adopted for this code.  I like to use
> the same rounding for every operation and have all ranges be specified as
> semi-open regions - x1 <= x < x2 - for example.
>
> The "insideness" rule is most naturally expressed by semi-open intervals
> since pixel centers that fall on a vertical boundary are inside if the
> interior region is to their right (i.e. the x1 <= x half of that equation
> above) and they are outside if the space to their right is non-interior
> (i.e. the x < x2 half of that equation).  A similar rule is also used
> vertically.
>
> The rounding operation which computes this would be:
>
> xc = computed crossing value for the current scan line;
> ceil(xc - 0.5);
>

Agreed; however, pisces code never performs such rounding: for me, ceil(x -
0.5) means (int) Math.round(x).

>
> This maps (for some integer k):
>
> k+0.4999  to  k
> k+0.5000  to  k
> k+0.5001  to  k+1
>
> Which is exactly what you want for the beginning of a region.  If the
> crossing is exactly at the pixel center then that is the furthest right
> that you can cross the scan line and still include pixel k.  If you just
> miss the center of a pixel to the right, then the first pixel to be
> included is k+1.  That function computes it correctly.
>
> Similarly, if you apply the same formula to the right edge of a region
> then you compute "the first pixel to not be included in the region.
> Consider that if you miss a pixel center on that right edge by falling
> slightly to the left of it, then that pixel is "outside" and it is the
> first such pixel to be "outside".  If you hit its pixel center exactly,
> then the insideness rule also says that it is "outside".  Only when you
> just miss it by falling slightly to the right of its center would that
> pixel still be included.  The above formula computes that value if you take
> its answer to be the "first pixel not included in the region".
>
> This lets you perform the same rounding on every value and treat every
> pair of "inside trigger" to "outside trigger" as a half-open interval. Note
> that otherwise you'd have to compute every floating point crossing, then
> run the EvenOdd vs. ZeroWinding rule on them, and finally only do the
> rounding after you knew which were left edges and which were right (which
> you don't know until you've computed every crossing on the scan line).
>  Since this rounding rule is symmetric given half-open intervals then you
> can round everything before you've analyzed the results of the winding
> rules.
>
> I then extend that philosophy to every set of coordinates I deal with, be
> it horizontal ranges, vertical ranges, shape rasterized results, or clip
> results, and the math usually works out in the simplest and most elegant
> way possible.
>
> For an AA renderer, it isn't clear that the sub-pixel computations should
> be as sub-sub-pixely accurate, but I'd like to think that the philosophy
> should be kept down to the lowest layers if we can, especially as this code
> can be tuned down to a single sub-pixel per real pixel and then it
> definitely should be as accurate as it can with the insideness rule (even
> though I doubt it will ever be set up that way).
>
> Another consideration is the vertices on an enclosing polygon.  You
> compute one segment to where it ends at xe1,ye1 and start computing the
> next segment from where it starts at xs2,ys2, but for adjacent segments in
> an outline, they start and end on a shared common point so xe1==xs2 and
> ye1==ys2.  If you compute crossings for both segments at that shared point
> then you have a chance of recording the crossings twice for that tiny chunk
> of the outline.  You could special case the last point in a segment, but
> that gets into lots of tricky tests.  On the other hand, if all intervals
> everywhere are half-open then anything you compute for xe1,ye1 would
> represent "the first line/pixel that segment 1 does not cause to be
> included" and computing the same values for xs2,ys2 would naturally
> generate "the first line/pixel that segment 2 causes to be included" and
> only one of them would induce inclusion on that scan line or pixel.  This
> gets even more valuable when you consider all cases of segment 1 going down
> to xe1,ye1 and segment 2 going back up from that point, or up/down or up/up
> or down/down - and also left/left or left/right or right/right or
> right/left.  All possible cases of how one segment arrives at xe1,ye1 and
> the next segment leaves from that same point just naturally fall out from
> using symmetric rounding and half-open intervals.
>
> I wanted to get that out to see if we were on the same page or if there
> was another way of doing the math that you are familiar with that might
> also make things easier.
>

Perfect and very clear. I will then check pisces rounding either on pixel
coordinates (px, py) or AA (virtual) pixel coordinates (spx, spy).

> I'm pretty sure that most of the code in the current version of Pisces
> uses that same philosophy, but I seem to recall that there were a couple of
> places that Denis was more comfortable with fully closed intervals. I don't
> remember the details, but be on the watch for it.  Also, I don't recall if
> it used a half-sub-pixel bias anywhere or simply punted on it as not being
> visible enough to worry about strict "center of sub-pixel" comparisons.
>

I am perplex and I am going to check pisces code against your given
approach.

> Now on to the specific questions:
>
>
>       private void addLine(float x1, float y1, float x2, float y2) {
>> ...
>>          // LBO: why ceil ie upper integer ?
>>          final int firstCrossing = Math.max(ceil(y1), boundsMinY); //
>> upper integer
>>          final int lastCrossing  = Math.min(ceil(y2), boundsMaxY); //
>> upper integer
>>
>
> Perhaps lastCrossing is misnamed.  I think it represents the end of the
> interval which is half-open.  Does that match the rest of the operations
> done on it?
>
> If every coordinate has already been biased by the -0.5 then ceil is just
> the tail end of the rounding equation I gave above.
>

That's not the case => buggy:  x1, y1 and x2, y2 are directly the point
coordinates as float values.

FYI, firstCrossing is used to compute the first X intersection for that Y
coordinate and determine the bucketIdx:
        _edges[ptr+CURX] = x1 + (firstCrossing - y1) * slope;
...
        final int bucketIdx = firstCrossing - boundsMinY;
...
        _edges[ptr+NEXT] = edgeBuckets[bucketIdx];

lastCrossing is used to give the YMax edge coordinate and edgedBucketCounts:
        _edges[ptr+YMAX] = lastCrossing;
...
        edgeBucketCounts[lastCrossing - boundsMinY] |= 1;

I think rounding errors can lead to pixel / shape rasterization
deformations ... ?

> If not, then the half-open considerations still apply if you use the same
> rounding for both the top and the bottom of the segment.  If lastCrossing
> represents the "bottom of the shape" then it should not be included.  If
> the next segment continues on from it, then its y1 will be computed in the
> first equation and will represent the first scanline that the next segment
> participates in.
>
>
>  => firstCrossing / lastCrossing should use lower and upper integer
>> values (rounding issues) ?
>>
>
> Do they make sense now given my explanation above?  Perhaps they should be
> renamed to indicate their half-openness?

I am a bit embarrassed to verify maths performed in ScanLineIterator.next()
which use edges and edgeBucketCounts arrays ... could you have a look ?
Apparently, it uses the following for loop that respects the semi-open
interval philosophy:
            for (int i = 0, ecur, j, k; i < count; i++) {
...

However, I am not sure if the edges "linked list" is traversed correctly ...

      boolean endRendering() {
>>          // TODO: perform shape clipping to avoid dealing with segments
>> out of bounding box
>>
>>          // Ensure shape edges are within bbox:
>>          if (edgeMinX > edgeMaxX || edgeMaxX < 0f) {
>>              return false; // undefined X bounds or negative Xmax
>>          }
>>          if (edgeMinY > edgeMaxY || edgeMaxY < 0f) {
>>              return false; // undefined Y bounds or negative Ymax
>>          }
>>
>
> I'd use min >= max since if min==max then I think nothing gets generated
> as a result of all edges having both the in and out crossings on the same
> coordinate.  Also, why not test against the clip bounds instead? The code
> after that will clip the edgeMinMaxXY values against the boundsMinMax
> values.  If you do this kind of test after that clipping is done (on
> spminmaxxy) then you can discover if all of the coordinates are outside the
> clip or the region of interest.

I tried here to perform few "fast" checks before doing float to int
conversions (costly because millions are performed): I think it can be
still improved: edgeMinX > edgeMaxX only ensures edgeMinX is defined and
both are positive !

TODO: improve boundary checks here.

>           // why use upper integer for edge min values ?
>>
>
> Because an edge falling between sample points includes only the sample
> point "above" it.  (Or "un-includes" it if it is a right/bottom edge.)
>
>
>  => Here is the question: why use ceil instead of floor ?
>>
>>          final int eMinX = ceil(edgeMinX); // upper positive int
>>          if (eMinX > boundsMaxX) {
>>              return false; // Xmin > bbox maxX
>>          }
>>
>>          final int eMinY = ceil(edgeMinY); // upper positive int
>>          if (eMinY > boundsMaxY) {
>>              return false; // Ymin > bbox maxY
>>          }
>>
>>          int spminX = Math.max(eMinX, boundsMinX);
>>          int spmaxX = Math.min(ceil(edgeMaxX), boundsMaxX); // upper
>> positive int
>>          int spminY = Math.max(eMinY, boundsMinY);
>>          int spmaxY = Math.min(ceil(edgeMaxY), boundsMaxY); // upper
>> positive int
>>
>>          int pminX = spminX >> SUBPIXEL_LG_POSITIONS_X;
>>          int pmaxX = (spmaxX + SUBPIXEL_MASK_X) >>
>> SUBPIXEL_LG_POSITIONS_X;
>>          int pminY = spminY >> SUBPIXEL_LG_POSITIONS_Y;
>>          int pmaxY = (spmaxY + SUBPIXEL_MASK_Y) >>
>> SUBPIXEL_LG_POSITIONS_Y;
>>
>>          // store BBox to answer ptg.getBBox():
>>          this.cache.init(pminX, pminY, pmaxX, pmaxY);
>>
>> In PiscesCache:
>>      void init(int minx, int miny, int maxx, int maxy) {
>> ...
>>          // LBO: why add +1 ??
>>          bboxX1 = maxx /* + 1 */;
>>          bboxY1 = maxy /* + 1 */;
>>
>
> I think the values passed in are "the smallest and largest values we
> encountered" and so adding one computes the first coordinate that is "not
> inside the sample region" as per the half-open philosophy.
>
> I tend to use and encourage "min/max" naming for closed-interval values
> and xy0/xy1 or xy1/xy2 for half-open-interval values.
>
>
>  => piscesCache previously added +1 to maximum (upper loop condition y <
>> y1)
>> but the pmaxX already uses ceil (+1) and (spmaxY + SUBPIXEL_MASK_Y)
>> ensures the last pixel is reached.
>>
>
> spmaxY + SUBPIXEL_MASK_Y computes the last sub-pixel piece of the last
> pixel.  It is essentially equivalen to "lastrow.9999999".  So, the first
> coordinate not included would be "lastrow+1" which is simply adding +1 to
> the sub-pixel "max" value that was given.

I understand but I figured out that pmaxX or pmaxY (pixel coordinates) may
lead to upper integer:
- spmaxY >> SUBPIXEL_LG_POSITIONS_Y
                                          => 6
- (spmaxY + SUBPIXEL_MASK_Y) >> SUBPIXEL_LG_POSITIONS_Y   => 7

So adding +1 in piscesCache will give maxY = 8 !!

>  I fixed these limits to have correct rendering due to dirty rowAARLE
>> reuse.
>>
>
> If there was a bug before I'd prefer to have it fixed within the
> philosophy of half-open intervals rather than trying to translate into
> closed intervals - it keeps all of the code on the same page.
>

I think it was not a real bug as new rowAARLE[][] was big enough and zero
filled so it led to only larger boundaries and tiles.

>
>  2013/4/24 Jim Graham <james.graham at oracle.com <mailto:
>> james.graham at oracle.com>>
>>
>> Do you have any faster implementation for Math.ceil/floor ? I now use
>> casting 1 + (int) / (int) but I know it is only correct for positive
>> numbers (> 0).
>>
>
> ceil(integer) == integer, but your technique would return "integer+1". I
> don't remember a decent technique for doing ceil with a cast.

I disagree:
- ceil(float) => the smallest (closest to negative infinity) floating-point
value that is greater than or equal to the argument and is equal to a
mathematical integer.
- floor(float) => the largest (closest to positive infinity) floating-point
value that less than or equal to the argument and is equal to a
mathematical integer.

i.e.  floor(x) < x < ceil(x)

Here are numbers comparing the Math.ceil / floor operations and my casting
implementation (only valid for >0):

floats       = [-134758.4, -131.5, -17.2, -1.9, -0.9, -1.0E-8, -1.0E-23,
-0.0, 0.0, 134758.4, 131.5, 17.2, 1.9, 0.9, 1.0E-8, 1.0E-23]
mathCeil     = [-134758, -131, -17, -1, 0, 0, 0, 0, 0, 134759, 132, 18, 2,
1, 1, 1]
fastMathCeil = [-134757, -130, -16, 0, 1, 1, 1, 1, 1, 134759, 132, 18, 2,
1, 1, 1]
mathFloor    = [-134759, -132, -18, -2, -1, -1, -1, 0, 0, 134758, 131, 17,
1, 0, 0, 0]
fastMathFloor= [-134758, -131, -17, -1, 0, 0, 0, 0, 0, 134758, 131, 17, 1,
0, 0, 0]

Micro benchmark results:
         # FastMathCeil: run duration: 10 000 ms
         4 threads, Tavg =     44,95 ns/op (σ =   0,39 ns/op), Total ops
=    890974137 [    44,54 (224799696),     44,76 (223695800),     45,59
(219626233),     44,93 (222852408)]

         # MathCeil: run duration: 10 000 ms
         4 threads, Tavg =    125,59 ns/op (σ =   1,71 ns/op), Total ops
=    318415673 [   125,81 (79406107),    128,11 (78061562),    125,22
(79862351),    123,33 (81085653)]

>
>  - clipping issues:
>> for very large dashed rectangle, millions of segments are emited from
>> dasher (x1) to stroker (x4 segments) to renderer (x4 segments).
>>
>> However, I would like to add a minimal clipping algorithm but the
>> rendering pipeline seems to be handling affine transforms between stages:
>>          /*
>>           * Pipeline seems to be:
>>           *    shape.getPathIterator
>>           * -> inverseDeltaTransformConsumer
>>           * -> Dasher
>>           * -> Stroker
>>           * -> deltaTransformConsumer OR transformConsumer
>>           * -> pc2d = Renderer (bounding box)
>>           */
>>
>
> If we could teach the stroker to be able to apply an elliptical pen then
> we could do the transforms once up front and simplify that quite a lot.
>  That optimization remains TBD.  We currently at least detect uniform
> scales and those can be handled with a single transform up front and
> multiplying all dash segment values and the line width by the uniform scale
> factor.  But, anything that transforms a circle into an ellipse requires us
> to use the multi-stage silliness above.

Could you give me more details about "elliptical pen" ie shearing shapes ?

Another idea to be discussed: rounding caps are quite expensive (curve
break into lines) and an optimization could determine if the cap is
invisible (out of the clip bounds) or visually useless = too small ie half
circle spans over less than 1 pixels (true pixel or AA pixels): any idea ?
maybe I should only check if the line width is too small to renderer caps
in such situations...

>  It is complicated to perform clipping in the Renderer and maybe to late
>> as a complete stroker's segment must be considered as a whole (4
>> segments without caps ...).
>>
>> Could you give me advices how to hack / add clipping algorithm in the
>> rendering pipeline (dasher / stroker / renderer) ?
>>
>
>  Should I try to derive a bounding box for the dasher / stroker depending
>> on the Affine transforms ?
>>
>
> Yes, that sounds right so far, but it's pretty high-level and I'm not sure
> where your level of understanding here falls short...?  I guess what I
> would do is:
>
> - compute an expanded "clip box" for the stroker based on the original
> clip, the transform, the line width, and the miter limit.  Note that the
> transform applied to the stroke coordinates varies depending on whether we
> use the "uniform scaling" optimization or not
>
> - test coordinates for being in the expanded clip box in the
> PathConsumer2D methods of Stroker.  If the segment is rejected, be sure to
> update all of the internal computations (normal values, last moveto coords,
> etc.) that might be needed for stroking the following segment
>

that's the plan but I would like to do it twice: in dasher and in stroker
... maybe it should be coded once in a new PiscesClipUtils class or Helpers
...

>
> - unfortunately this means that dasher doesn't get the advantages, you
> could add similar code to the dasher, but then you have to be sure an
> entire dash is omitted before you can omit the data going on, otherwise the
> stroker will not get proper incoming coordinates from "the previous segment
> that was out of bounds".  That could be managed with proper communication
> between the dasher and stroker, but then it gets more complicated.
>

Agreed but maybe it should be analyzed more deeply to evaluate if it is not
too hard ...

>
> Those are the caveats that I would keep in mind while I was designing a
> solution - hope it helps some...
>
>                         ...jim
>

PS: I have more free time until sunday (children are away), so I would like
to talk to you using skype if it is possible ?
If yes, could you give me your skype account and work times (GMT time
please) ?
I am living in Grenoble, France (GMT+1)

Thanks a lot,
Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20130502/8f890813/attachment.html>