[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Wed Oct 6 20:36:38 UTC 2010

Hello Jim.

In the last week I have been implementing various optimizations, in
particular to anti-aliasing. I tried to remove PiscesCache, and I did,
but the code ended up being a mess, and it was actually slower, so I
scrapped it.
Most of the motivation for removing PiscesCache was so that getTypicalAlpha
would be easier to implement. So, I tried to implement this without
removing PiscesCache. Quite a few changes had to be made, but it is done
and it speeds up rendering of mostly full or mostly empty shapes pretty
dramatically. The curves animation in Java2Ddemo is about 80% faster on my
machine than when using java6. Somewhat disappointingly, its fps is only
about 0.7 times that of proprietary java. When rendering single frames of
very complex paths it's only half as fast as closed java. Still it is
considerably faster than what we have now, so I shouldn't complain.

webrev:
http://icedtea.classpath.org/~dlila/webrevs/noflatten/webrev/

This is how I implemented getTypicalAlpha:
I added an int array A[numHorizTiles][numVertTiles] such that Aij
contains the added alpha of every pixel in tile ij. This array starts
with every entry being 0, and I increase its elements as RLE pairs are
added to the "cache". I also made the cache store its data in an int[][]
where the run lengths for row i are stored in the ith subarray (before,
the run lengths were all put in the same array, and an offset array was kept
along with another accounting array to make sense of it). I did
this for clarity and because it helped with the implementation of getTypicalAlpha.
I am not sure how the latter is true anymore, but it is done, it works, and its
fast, so I think we should leave it, even if it's not completely necessary.

I also implemented the optimization of removing all the untransforming if
the transformation is a multiple of an orthogonal transformation. 

I implemented another optimization where Renderer's constructor takes a
boolean argument that indicates whether its input is monotonic in x and y, and
I gave Stroker an argument that indicates whether antialiasing is being done.
These allow Stroker to skip the rotation it does of its input curves before
subdividing them, and just subdivide the curves into monotonic x and y pieces.
These are then fed to Renderer, which knows that all its input is monotonic,
so it doesn't try to make its input monotonic (which normally is required,
otherwise it wouldn't be able to keep track of the smallest and largest x and y
coordinates).

A minor change: Renderer's constructor no longer requires a cache argument. It
creates its own now.

I think that's it. For the first time, the renderer is at a point where I have
no major concerns about it, and would be pretty comfortable with pushing it,
although, of course, I would appreciate any suggestions, in particular comments
on how to improve performance (especially AA performance - I must admit I was
expecting larger improvements).

Thank you
Denis.

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Hi Denis,
> 
> On 9/27/2010 7:43 AM, Denis Lila wrote:
> > Hi Jim.
> >> How much faster?  I'm worried about this, especially given our
> tiled
> >> approach to requesting the data.  What was the bottleneck before?
> >> (It's been a while since I visited the code - we weren't computing
> the
> >> crossings for every curve in the path for every tile being
> generated
> >> were we?)
> >
> >      Not much faster. I'm working on someting better.
> 
> Then hopefully we can get to something with better memory and CPU
> costs.
> 
> >      I'm not sure about the bottleneck, but what we were doing
> before is:
> > 1. Flatten (by subdividing) every curve so that we deal only with
> lines.
> > 2. Add each line to a list sorted by y0. When end_rendering was
> called
> > for each scanline we found the crossings of the scanline and every
> line
> > in our line list, which we used to compute the alpha for that
> scanline's
> > pixel row. All this would be put into RLE encoded temporary storage
> and it
> > would be read back and converted into tile form by
> PiscesTileGenerator.
> >
> >      Speaking of which, would it not be better to get rid of
> PiscesCache and
> > just keep a buffer with the current tile row in Renderer.java. This
> would
> > be possible because the specification for AATileGenerator says the
> iteration
> > is like: for (y...) for (x...);.
> > Why is PiscesCache there? It isn't being used as a cache at all.
> Could it be?
> > Also, why do we output tiles, instead of just pixel rows (which I
> guess would
> > just be nx1 tiles). Is it because we would like to use
> getTypicalAlpha to eliminate
> > completely transparent or completely opaque regions as soon as
> possible (and the
> > longer a tile is the less of a chance it has at being either of
> those two)?
> 
> That was basically "cramming what we had into the interface's box". 
> The 
> cache existed for something that was being done on mobile, but it 
> doesn't have much of a place in our APIs so it was just reused for
> tile 
> generation.  If we have a much more direct way of doing it then it
> would 
> be great to get rid of it.
> 
> I think we can support "ALL1s" and "ALL0s" reasonably without the
> cache.
> 
> >> I can see your points here.  I think there are solutions to avoid
> much
> >> of the untransforming we can consider, but your solution works well
> so
> >> let's get it in and then we can look at optimizations if we feel
> they
> >> are causing a measurable problem later.
> >
> >      I should say this isn't quite as bad as I might have made it
> seem. Firstly,
> > this IO handler class I made elimiinates transformations when
> Dasher
> > communicates with Stroker. More importantly, no untransforming is
> done
> > when the transformation is just a translation or is the identity or
> is singular
> > and when STROKE_CONTROL is off, we only transform the output path.
> That's
> > because the most important reason for handling transforms the way I
> do now
> > is because we can't normalize untransformed paths, otherwise
> coordinate
> > adjustments might be magnified too much. So, we need to transform
> paths
> > before normalization. But we also can't do the stroking and
> widening
> > before the normalization. But if normalization is removed we can
> just pass
> > untransformed paths into Stroker, and transform its output (which is
> still
> > somewhat more expensive than only trasnforming the input path,
> since
> > Stroker produces many 3-7 curves for each input curve).
> 
> Can the untransform be eliminated in the case of scaling?  (Whether
> just 
> for uniform scaling, or maybe even for non-uniform scaling with no 
> rotation or shearing?)
> 
> >> I'm not sure I understand the reasoning of the control point
> >> calculation.  I'll have to look at the code to register an
> opinion.
> >
> >      I'm sorry, my explanation wasn't very clear. I attached a
> picture that
> > will hopefully clarify things.
> > But, in a way, the computation I use is forced on us. Suppose we
> have a
> > quadratic curve B and we need to compute one of its offsets C.
> C'(0)
> > and C'(1) will be parallel to B'(0) and B'(1) so we need to make
> sure
> > our computed offset has this property too (or it would look weird
> around
> > the endpoints). Now, B'(0) and B'(1) are parallel to p2-p1 and
> p3-p2
> > where p1,p2,p3 are the 3 control points that define B, so if the
> control
> > points of C are q1, q2, q3 then q2-q1 and q3-q2 must be parallel to
> p2-p1
> > and p3-p2 respectively. At this point, we need more constraint,
> since
> > our system is underdetermined. We use the constraints that q1 =
> C(0)
> > and q3 = C(1) (so, the endpoints of the computed offset are equal to
> the
> > endpoints of the ideal offset). All we have left to compute is q2,
> but
> > we know the direction of q2-q1 and the direction of q3-q2, so q2
> must
> > lie on the lines defined by q1+t*(q2-q1) and q3+t*(q3-q2) so q2
> must
> > be the intersection of these lines.
> 
> I agree that if you are creating a parallel curve then intersection is
> 
> the way to go.  I guess what I was potentially confused about was 
> whether there are cases where you need to subdivide at all? 
> Regardless 
> of subdivision, when you get down to the final step of creating the 
> parallel curves then I believe offsetting and finding the intersection
> 
> is correct (though I reserve the possibility that there might still be
> a 
> simpler way - I haven't done any investigation to know if that is
> true).
> 
> >> It sounds like you are correct here.  What does the closed source
> code
> >> draw?
> >
> >      I thought closed source java simply didn't draw the round joins
> in
> > these cases, but I did some more testing and it turns out it does
> for
> > some curves and it doesn't for others. I've included the results of
> a
> > test I wrote that tries to draw paths like:
> moveTo(0,0);p.lineTo(cos,sin);p.lineTo(0,0);
> > where cos and sin are coordinates on a circle (source1.png is the
> output
> > of closed java. Source2.png is my output). As you can see, my
> > version draws the round joins on all tested cases, while closed
> java
> > is inconsistent.
> 
> You rock then!  A bug should be filed on closed JDK.  Can you file it
> or 
> send me your test case and I'll do it?
> 
> >      That sounds good. Hopefully by the end of today I'll have a
> > less memory hungry AA implementation that is also faster.
> 
> Yay!
> 
> > Thank you,
> 
> Ummm...  Thank *you*.  You're doing all the good work here, I'm just 
> sitting back, throwing out tiny crumbs of past experience and watching
> 
> the ensuing woodchips fly with awe.  I've had on my wish list for some
> 
> time to be able to eliminate these last few closed source holdouts,
> but 
> the quality of the Ductus code was so high that I never got motivated
> to 
> try.  Who knows now...  ;-)
> 
> 			...jim