[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Wed Sep 29 16:55:46 UTC 2010

Hi Jim.

> Then hopefully we can get to something with better memory and CPU costs.

I re-implemented the AA rasterizer twice more, with saving memory in
mind. The first version used a completely different algorithm for computing
crossings: It didn't do it incrementally. It just computed the t value 
where B(t)-CurrentScanline=0 using newton's method, then it computed
the x value at this t, and that's the crossing. Because all our curves
were monotonic in x and y, we could guarantee that there was exaclty one
such t per scanline per curve. The initial x0 in Newton's method was 
(scanline - y0)/(y1-y0) (where y1 and y0 are the y extrema of the curve).
This could be computed incrementally using just one addition per scanline,
and it worked very nicely in minimizing the iterations in Newton's method.
However, there still had to be at least one iteration, which came with at
least one multiplication and one division per scanline, which is much more
expensive than what adaptive forward differencing was doing. However, it
was still a bit faster than the adaptive forward differencing version,
probably since it didn't need to allocate ridiculous amounts of memory.
The memory usage of this was far better than anything we had had until then,
because for storing crossings it only needed 4*n bytes, where n is 
the highest number of crossings on any scanline, as opposed to 
4*numScanlines*n, which is what I had before. Since we store curves, instead
of the lines produced by flattening curves, this storage is also reduced by a lot.

But then I found a way to implement adaptive forward differencing AND save
memory. So, what I have now has the same memory usage described above, but
it's also a little faster (even now, that I haven't optimized it at all).
The webrev containing this isn't up yet (but everything else in the last
webrev link I sent is pretty much the same as what I have now on my machine,
so feel free to look at Stroker.java).

> Can the untransform be eliminated in the case of scaling?  (Whether just 
> for uniform scaling, or maybe even for non-uniform scaling with no 
> rotation or shearing?)

I'm glad you bring this up. I thought a bit about this, and the only thing
that causes problems in Stroker is that for some transformations, if we 
feed Stroker the transformed curve, the width will not be constant throughout
the curve. Therefore we can eliminate the untransforming for every matrix
that keeps these lengths constant. This turns out to be any constant multiples
of orthogonal matrices. So, if the transformation is A=[[a,b],[c,d]], all we
have to do is check for a*b==-c*d && a*a+c*c==b*b+d*d. If this is the case,
we can just make the pathIterator of the shape do the transforming, and we can
forget all about it (which is great, since what's hurting us is the transformation
of our output paths, not the untransformation of the input). 
So, to answer your question, we can't eliminate the untransforming for non
uniform scalings, but we can eliminate it for rotations, uniform transforms,
and even for shears of the form [[1,b],[-b,1]].

> You rock then!  A bug should be filed on closed JDK.  Can you file it or 
> send me your test case and I'll do it?

I filed it. Bug id: 6987950.

> > Thank you,
> 
> Ummm...  Thank *you*.  You're doing all the good work here, I'm just 
> sitting back, throwing out tiny crumbs of past experience and watching
> the ensuing woodchips fly with awe.  I've had on my wish list for some 
> time to be able to eliminate these last few closed source holdouts, but 
> the quality of the Ductus code was so high that I never got motivated to 
> try.  Who knows now...  ;-)

Well, I couldn't have done it without your help, so

Thank you,
Denis.

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Hi Denis,
> 
> On 9/27/2010 7:43 AM, Denis Lila wrote:
> > Hi Jim.
> >> How much faster?  I'm worried about this, especially given our
> tiled
> >> approach to requesting the data.  What was the bottleneck before?
> >> (It's been a while since I visited the code - we weren't computing
> the
> >> crossings for every curve in the path for every tile being
> generated
> >> were we?)
> >
> >      Not much faster. I'm working on someting better.
> 
> >      I'm not sure about the bottleneck, but what we were doing
> before is:
> > 1. Flatten (by subdividing) every curve so that we deal only with
> lines.
> > 2. Add each line to a list sorted by y0. When end_rendering was
> called
> > for each scanline we found the crossings of the scanline and every
> line
> > in our line list, which we used to compute the alpha for that
> scanline's
> > pixel row. All this would be put into RLE encoded temporary storage
> and it
> > would be read back and converted into tile form by
> PiscesTileGenerator.
> >
> >      Speaking of which, would it not be better to get rid of
> PiscesCache and
> > just keep a buffer with the current tile row in Renderer.java. This
> would
> > be possible because the specification for AATileGenerator says the
> iteration
> > is like: for (y...) for (x...);.
> > Why is PiscesCache there? It isn't being used as a cache at all.
> Could it be?
> > Also, why do we output tiles, instead of just pixel rows (which I
> guess would
> > just be nx1 tiles). Is it because we would like to use
> getTypicalAlpha to eliminate
> > completely transparent or completely opaque regions as soon as
> possible (and the
> > longer a tile is the less of a chance it has at being either of
> those two)?
> 
> That was basically "cramming what we had into the interface's box". 
> The 
> cache existed for something that was being done on mobile, but it 
> doesn't have much of a place in our APIs so it was just reused for
> tile 
> generation.  If we have a much more direct way of doing it then it
> would 
> be great to get rid of it.
> 
> I think we can support "ALL1s" and "ALL0s" reasonably without the
> cache.
> 
> >> I can see your points here.  I think there are solutions to avoid
> much
> >> of the untransforming we can consider, but your solution works well
> so
> >> let's get it in and then we can look at optimizations if we feel
> they
> >> are causing a measurable problem later.
> >
> >      I should say this isn't quite as bad as I might have made it
> seem. Firstly,
> > this IO handler class I made elimiinates transformations when
> Dasher
> > communicates with Stroker. More importantly, no untransforming is
> done
> > when the transformation is just a translation or is the identity or
> is singular
> > and when STROKE_CONTROL is off, we only transform the output path.
> That's
> > because the most important reason for handling transforms the way I
> do now
> > is because we can't normalize untransformed paths, otherwise
> coordinate
> > adjustments might be magnified too much. So, we need to transform
> paths
> > before normalization. But we also can't do the stroking and
> widening
> > before the normalization. But if normalization is removed we can
> just pass
> > untransformed paths into Stroker, and transform its output (which is
> still
> > somewhat more expensive than only trasnforming the input path,
> since
> > Stroker produces many 3-7 curves for each input curve).
> 

> 
> >> I'm not sure I understand the reasoning of the control point
> >> calculation.  I'll have to look at the code to register an
> opinion.
> >
> >      I'm sorry, my explanation wasn't very clear. I attached a
> picture that
> > will hopefully clarify things.
> > But, in a way, the computation I use is forced on us. Suppose we
> have a
> > quadratic curve B and we need to compute one of its offsets C.
> C'(0)
> > and C'(1) will be parallel to B'(0) and B'(1) so we need to make
> sure
> > our computed offset has this property too (or it would look weird
> around
> > the endpoints). Now, B'(0) and B'(1) are parallel to p2-p1 and
> p3-p2
> > where p1,p2,p3 are the 3 control points that define B, so if the
> control
> > points of C are q1, q2, q3 then q2-q1 and q3-q2 must be parallel to
> p2-p1
> > and p3-p2 respectively. At this point, we need more constraint,
> since
> > our system is underdetermined. We use the constraints that q1 =
> C(0)
> > and q3 = C(1) (so, the endpoints of the computed offset are equal to
> the
> > endpoints of the ideal offset). All we have left to compute is q2,
> but
> > we know the direction of q2-q1 and the direction of q3-q2, so q2
> must
> > lie on the lines defined by q1+t*(q2-q1) and q3+t*(q3-q2) so q2
> must
> > be the intersection of these lines.
> 
> I agree that if you are creating a parallel curve then intersection is
> 
> the way to go.  I guess what I was potentially confused about was 
> whether there are cases where you need to subdivide at all? 
> Regardless 
> of subdivision, when you get down to the final step of creating the 
> parallel curves then I believe offsetting and finding the intersection
> 
> is correct (though I reserve the possibility that there might still be
> a 
> simpler way - I haven't done any investigation to know if that is
> true).
> 
> >> It sounds like you are correct here.  What does the closed source
> code
> >> draw?
> >
> >      I thought closed source java simply didn't draw the round joins
> in
> > these cases, but I did some more testing and it turns out it does
> for
> > some curves and it doesn't for others. I've included the results of
> a
> > test I wrote that tries to draw paths like:
> moveTo(0,0);p.lineTo(cos,sin);p.lineTo(0,0);
> > where cos and sin are coordinates on a circle (source1.png is the
> output
> > of closed java. Source2.png is my output). As you can see, my
> > version draws the round joins on all tested cases, while closed
> java
> > is inconsistent.
> 
> >      That sounds good. Hopefully by the end of today I'll have a
> > less memory hungry AA implementation that is also faster.
> 
> Yay!
> 

> 
> 			...jim