[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Mon Sep 27 22:58:53 UTC 2010

Hi Denis,

On 9/27/2010 7:43 AM, Denis Lila wrote:
> Hi Jim.
>> How much faster?  I'm worried about this, especially given our tiled
>> approach to requesting the data.  What was the bottleneck before?
>> (It's been a while since I visited the code - we weren't computing the
>> crossings for every curve in the path for every tile being generated
>> were we?)
>
>      Not much faster. I'm working on someting better.

Then hopefully we can get to something with better memory and CPU costs.

>      I'm not sure about the bottleneck, but what we were doing before is:
> 1. Flatten (by subdividing) every curve so that we deal only with lines.
> 2. Add each line to a list sorted by y0. When end_rendering was called
> for each scanline we found the crossings of the scanline and every line
> in our line list, which we used to compute the alpha for that scanline's
> pixel row. All this would be put into RLE encoded temporary storage and it
> would be read back and converted into tile form by PiscesTileGenerator.
>
>      Speaking of which, would it not be better to get rid of PiscesCache and
> just keep a buffer with the current tile row in Renderer.java. This would
> be possible because the specification for AATileGenerator says the iteration
> is like: for (y...) for (x...);.
> Why is PiscesCache there? It isn't being used as a cache at all. Could it be?
> Also, why do we output tiles, instead of just pixel rows (which I guess would
> just be nx1 tiles). Is it because we would like to use getTypicalAlpha to eliminate
> completely transparent or completely opaque regions as soon as possible (and the
> longer a tile is the less of a chance it has at being either of those two)?

That was basically "cramming what we had into the interface's box".  The 
cache existed for something that was being done on mobile, but it 
doesn't have much of a place in our APIs so it was just reused for tile 
generation.  If we have a much more direct way of doing it then it would 
be great to get rid of it.

I think we can support "ALL1s" and "ALL0s" reasonably without the cache.

>> I can see your points here.  I think there are solutions to avoid much
>> of the untransforming we can consider, but your solution works well so
>> let's get it in and then we can look at optimizations if we feel they
>> are causing a measurable problem later.
>
>      I should say this isn't quite as bad as I might have made it seem. Firstly,
> this IO handler class I made elimiinates transformations when Dasher
> communicates with Stroker. More importantly, no untransforming is done
> when the transformation is just a translation or is the identity or is singular
> and when STROKE_CONTROL is off, we only transform the output path. That's
> because the most important reason for handling transforms the way I do now
> is because we can't normalize untransformed paths, otherwise coordinate
> adjustments might be magnified too much. So, we need to transform paths
> before normalization. But we also can't do the stroking and widening
> before the normalization. But if normalization is removed we can just pass
> untransformed paths into Stroker, and transform its output (which is still
> somewhat more expensive than only trasnforming the input path, since
> Stroker produces many 3-7 curves for each input curve).

Can the untransform be eliminated in the case of scaling?  (Whether just 
for uniform scaling, or maybe even for non-uniform scaling with no 
rotation or shearing?)

>> I'm not sure I understand the reasoning of the control point
>> calculation.  I'll have to look at the code to register an opinion.
>
>      I'm sorry, my explanation wasn't very clear. I attached a picture that
> will hopefully clarify things.
> But, in a way, the computation I use is forced on us. Suppose we have a
> quadratic curve B and we need to compute one of its offsets C. C'(0)
> and C'(1) will be parallel to B'(0) and B'(1) so we need to make sure
> our computed offset has this property too (or it would look weird around
> the endpoints). Now, B'(0) and B'(1) are parallel to p2-p1 and p3-p2
> where p1,p2,p3 are the 3 control points that define B, so if the control
> points of C are q1, q2, q3 then q2-q1 and q3-q2 must be parallel to p2-p1
> and p3-p2 respectively. At this point, we need more constraint, since
> our system is underdetermined. We use the constraints that q1 = C(0)
> and q3 = C(1) (so, the endpoints of the computed offset are equal to the
> endpoints of the ideal offset). All we have left to compute is q2, but
> we know the direction of q2-q1 and the direction of q3-q2, so q2 must
> lie on the lines defined by q1+t*(q2-q1) and q3+t*(q3-q2) so q2 must
> be the intersection of these lines.

I agree that if you are creating a parallel curve then intersection is 
the way to go.  I guess what I was potentially confused about was 
whether there are cases where you need to subdivide at all?  Regardless 
of subdivision, when you get down to the final step of creating the 
parallel curves then I believe offsetting and finding the intersection 
is correct (though I reserve the possibility that there might still be a 
simpler way - I haven't done any investigation to know if that is true).

>> It sounds like you are correct here.  What does the closed source code
>> draw?
>
>      I thought closed source java simply didn't draw the round joins in
> these cases, but I did some more testing and it turns out it does for
> some curves and it doesn't for others. I've included the results of a
> test I wrote that tries to draw paths like: moveTo(0,0);p.lineTo(cos,sin);p.lineTo(0,0);
> where cos and sin are coordinates on a circle (source1.png is the output
> of closed java. Source2.png is my output). As you can see, my
> version draws the round joins on all tested cases, while closed java
> is inconsistent.

You rock then!  A bug should be filed on closed JDK.  Can you file it or 
send me your test case and I'll do it?

>      That sounds good. Hopefully by the end of today I'll have a
> less memory hungry AA implementation that is also faster.

Yay!

> Thank you,

Ummm...  Thank *you*.  You're doing all the good work here, I'm just 
sitting back, throwing out tiny crumbs of past experience and watching 
the ensuing woodchips fly with awe.  I've had on my wish list for some 
time to be able to eliminate these last few closed source holdouts, but 
the quality of the Ductus code was so high that I never got motivated to 
try.  Who knows now...  ;-)

			...jim