[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Thu Jul 29 03:55:07 UTC 2010

Hi Denis,

Only some minor comments so far:

Stroker.java:

- Should det be precomputed and saved?  (You calculate it in the 
constructor anyway, but don't save it.)
- Should test for uniform scale be precomputed?
- (Is test for uniform scale too strict?  Can a rotated uniform scale 
use the same code as upright uniform scale?)
- Why are m00_2_m01_2 et al no longer precomputed (they only need to be 
precomputed if scale is not uniform which isn't common)?
- lineLength method is "user space length" isn't it? - a more 
descriptive name might help avoid confusion if someone is modifying code 
here later.
- line 614 - missing space

Dasher.java:

- Line 187 - use "leftInThisDashSegment" here?

I still have to look at Renderer.java in depth, but I thought I'd send 
these minor comments along while they are fresh in my email buffer...

			...jim

Denis Lila wrote:
> Hello.
> 
> And, here it is: http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/
> 
> Just as I thought, it's way faster than the original. I made a new
> test, which is supposed to be more realistic than drawing 30000 length
> lines. It consists of splitting a 1000x1000 frame in 100 10x10 squares
> and in each square drawing 20 or so curves with random start/end/control
> points in that square. In this case it's at least twice faster (~0.85
> versus ~1.95 seconds).
> 
> Unfortunately I've had to do a lot of the same optimizations that I was
> trying to remove (i.e. making everything a global variable). I tried
> writing a higher level version of it, but it completely negated the
> performance gains of the new algorithm. Anyway, it's still not as bad
> as before because the algorithm is inherently clearer and we only iterate
> through scan lines instead of iterating through strips and then scanlines
> in that strip, and then edges, all in the same method.
> 
> It's also better organized, and logically separate parts of the code don't
> really touch each other's variables much. And it's definitely better
> commented. 
> 
> I also made some minor changes outside of Renderer that did not appear in
> the first version of this patch last week:
> 1. I fixed the issue Jim pointed out in Dasher, where if 
> (origLen == leftInThisDashSegment) the dash index was not being incremented.
> 2. In dasher, I no longer copy the dash array.
> 3. I removed files PiscesMath.java and Transform4.java because they are no
> longer needed.
> 
> Thanks,
> Denis.
> 
> ----- "Jim Graham" <james.graham at oracle.com> wrote:
> 
>> Woohoo, Denis!  I look forward to seeing the new version!
>>
>> 		...jim
>>
>> On 7/28/2010 5:51 AM, Denis Lila wrote:
>>> Hello Jim.
>>>
>>> This one performs almost identically to what is already there
>>> in openjdk6 and 7, since it's exactly what I sent for review
>>> last week, but with all the changes you suggested implemented.
>>>
>>> I would actually like to ask you to not look at either one of
>>> them. First of all, there is an ArrayIndexOutOfBoundsException
>>> possible in emitRow.
>>> And secondly, even if there wasn't, last night I implemented
>>> your algorithm from ShapeSpanIterator.c to iterate through
>>> the edges. I have yet to debug it, but it makes everything much,
>>> much simpler, and it should make it far faster, so we get the best
>>> of both worlds.
>>>
>>> Thanks,
>>> Denis.
>>>
>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
>>>
>>>> Hi Denis,
>>>>
>>>> I'll try to get through both versions and see if I can find
>> anything
>>>> that was hurting performance with your EdgeLists.  I'm guessing
>> that
>>>> this version was created because of the performance issues you
>> found
>>>> with the EdgeList version?  Does this perform more closely to the
>>>> existing code than the EdgeList version?
>>>>
>>>> 			...jim
>>>>
>>>> Denis Lila wrote:
>>>>> Hello again.
>>>>>
>>>>> This attachmet is a "can of worms" implementation without all the
>>>> fancy (and slow)
>>>>> iteration. It also includes all of the other suggestions you sent
>> in
>>>> your first
>>>>> review of Dasher and Renderer last week (most importantly, the
>>>> firstOrientation
>>>>> issue, horizontal lines filtering, and adding prefixes to
>> variable
>>>> names to make
>>>>> it clear whether they refer to pixels, or subpixels).
>>>>>
>>>>> Regards,
>>>>> Denis.
>>>>>
>>>>> ----- "Denis Lila"<dlila at redhat.com>  wrote:
>>>>>
>>>>>> Hello Jim.
>>>>>>
>>>>>> I implemented your "can of worms" idea. It works, and it got rid
>>>> of
>>>>>> the biasing.
>>>>>> I wasn't able to send a webrev, but there are many changes and a
>>>> side
>>>>>> by side
>>>>>> comparison would probably be useless, so I just attached the
>> file.
>>>> I
>>>>>> hope this is
>>>>>> ok.
>>>>>>
>>>>>> I also implemented a "better" iterating structure for the lines
>>>> and
>>>>>> the
>>>>>> strips and crossings. I think it is better in every way, except
>>>>>> performance.
>>>>>> The new file is more than 200 lines smaller than the old one.
>> The
>>>> only
>>>>>> members of Renderer are now the AA variables and the position
>>>>>> variables
>>>>>> (sx*, sy*, x*, y*).
>>>>>> What I've done is I added an EdgeList class, which encapsulates
>>>> all
>>>>>> the edge
>>>>>> related variables in the old Renderer. At first, I had an Edge
>>>> class
>>>>>> in addition
>>>>>> to the EdgeList class, and while this was much nicer, it turned
>> out
>>>> to
>>>>>> be too
>>>>>> expensive (see last paragraph).
>>>>>> I've also added a ScanLineIterator, so instead of _endRendering
>>>>>> iterating
>>>>>> through strips, and then calling renderStrip() which iterates
>>>> through
>>>>>> the
>>>>>> scanlines in that strip, and then through the crossings in that
>>>>>> scanline,
>>>>>> what happens now is that _endRendering uses the
>> Iterator<ScanLine>
>>>> to
>>>>>> iterate through each scanline, get get its crossings and iterate
>>>>>> through them
>>>>>> to accumulate the alpha. By the way, a ScanLine is a type
>> defined
>>>> by
>>>>>> an
>>>>>> interface which exports methods for getting the y coord of the
>>>> line,
>>>>>> the
>>>>>> number of crossings in it, the ith crossing, and a method for
>>>> sorting
>>>>>> its crossings.
>>>>>> The class that implements ScanLine is ScanLineIterator itself. I
>>>> made
>>>>>> a
>>>>>> ScanLine class, but I was afraid performance would suffer
>> because
>>>> of
>>>>>> all the
>>>>>> object creations (this turned out not to be an issue, after I
>>>> switched
>>>>>> to the
>>>>>> current way, and remeasured things). I did not switch back
>> because
>>>>>> this is
>>>>>> only slightly worse.
>>>>>>
>>>>>> As for performance: I wrote a simple program that tries to draw
>> a
>>>>>> dashed path
>>>>>> that consists of about 160 dashed lines of width 1 and length
>>>> 30000,
>>>>>> going
>>>>>> from the centre of the frame to some point. On my machine, this
>>>> takes
>>>>>> about 4.9
>>>>>> seconds in openjdk6, and 26 seconds using the attached file.
>> Back
>>>> when
>>>>>> I was using
>>>>>> the Edge class it took about 39 seconds. Everything without
>> hundres
>>>> of
>>>>>> thousands of
>>>>>> edges is not much slower
>>>>>> I have not changed any of the algorithms. ScanLineIterator still
>>>> goes
>>>>>> through
>>>>>> strips of the same size and computes crossings in every strip
>>>> using
>>>>>> the same
>>>>>> method as before, so I don't know why it's so slow. It can't be
>>>>>> because of anything
>>>>>> happening in _endRendering, because there are only about 9000
>>>>>> scanlines and for each
>>>>>> of them I've just added a few calls to one line getters (which
>> used
>>>> to
>>>>>> be direct
>>>>>> accesses into arrays).
>>>>>>
>>>>>> Thanks,
>>>>>> Denis.
>>>>>>
>>>>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
>>>>>>
>>>>>>> Denis Lila wrote:
>>>>>>>> Hello Jim.
>>>>>>>>
>>>>>>>> Thank you very much for taking the time to read through this.
>>>>>>>>
>>>>>>>>> 169 - if origLen reaches the end of the dash exactly (the
>> "=="
>>>>>>> case)
>>>>>>>>      You're right, I should. I can't just replace<= with ==
>>>>>> though,
>>>>>>>> because the results will be the same: in the equal case
>> origLen
>>>>>>> will
>>>>>>>> become 0, and on the next iteration, the (origLen<
>>>>>> dash[idx]-phase)
>>>>>>> will
>>>>>>>> be true, and we will do a goTo(x1,y1), which is what we just
>> did
>>>>>> in
>>>>>>> the
>>>>>>>> previous iteration (unless dash[idx] is 0, in which case the
>>>>>> results
>>>>>>>> will be even worse). The best solution to this is to just do a
>>>>>>> nested
>>>>>>>> check for the == case.
>>>>>>> Ah, right - because there is no "break" when origLen becomes
>>>> zero.
>>>>>>> Sounds like you're on it.
>>>>>>>
>>>>>>>>> 171 - Aren't x0,y0 stored as subpix values?  You would then
>> be
>>>>>>> comparing
>>>>>>>>> a subpix value to a non-subpix value.  Perhaps if the subpix
>>>>>> calls
>>>>>>> are
>>>>>>>>> moved to the top of the function I think this should work OK?
>>>>>>>>      That's true, they are. This is very puzzling. If a
>>>> horizontal
>>>>>>> line is
>>>>>>>> added, when the crossings for it are being computed, dxBydy
>>>> should
>>>>>>> be NaN, and
>>>>>>>> wouldn't an error be thrown when we try to cast to an int in
>> the
>>>>>>> call to addCrossing?
>>>>>>>
>>>>>>> I'm not sure - I didn't trace it through very far - I just
>> noted
>>>>>> that
>>>>>>> the values were likely in different "resolutions".
>>>>>>>
>>>>>>>>> 194,197 - Shouldn't these be constants, or based on the
>>>>>>> SUB_POS_XY?
>>>>>>>>      I suppose I should make a biasing constant. I don't think
>>>> they
>>>>>>> should be based
>>>>>>>> on SUB_POS_XY though, because the biasing is done to subpixel
>>>>>>> coordinates so
>>>>>>>> there is no danger that if our supersampling is fine enough
>> the
>>>>>>> biasing will
>>>>>>>> make the coordinates jump over some scan line.
>>>>>>> I'm guessing you punted on my "can of worms" suggestion then.
>>>> ;-)
>>>>>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then you
>>>>>> don't
>>>>>>> have
>>>>>>>>> to "unsubpix" them here.  (Though you still need the subpix
>> sy0
>>>>>> for
>>>>>>> line
>>>>>>>>> 209 - or you could just call subpix on it for that line and
>> at
>>>>>>> least
>>>>>>>>> you'd save 1 call to sub/unsub).
>>>>>>>>      Ok. I'll just save subpixed and unsubpixed versions of
>> sx0,
>>>>>> sy0.
>>>>>>> That should
>>>>>>>> eliminate all sx0,sy0 related calls to tosubpix and topix
>> except
>>>>>> in
>>>>>>> moveTo.
>>>>>>>
>>>>>>> and lineTo.  You may only need 3 of those values, though, if I
>>>>>>> remember
>>>>>>> my code reading well enough.
>>>>>>>
>>>>>>>>> 256,264 - casting to int is problematic.  It truncates
>> towards
>>>> 0
>>>>>>> which
>>>>>>>>> means negatives are ceil'd and positives are floor'd.  It
>> would
>>>>>> be
>>>>>>> best
>>>>>>>>> to use floor here instead.  On the other hand, if negative
>>>>>> numbers
>>>>>>> are
>>>>>>>>> always "off the left side of the drawable" then this is moot.
>>>>>>>>      That's why I left it at int casting. Do you still think I
>>>>>> should
>>>>>>> change it
>>>>>>>> to floor?
>>>>>>> If you mean floor, I think it best to use floor.  Unless you
>> can
>>>>>> prove
>>>>>>> that negatives aren't really an issue and that the strange
>>>>>> truncation
>>>>>>> on
>>>>>>> them won't be numerically a problem - but I don't think it is
>>>> worth
>>>>>> it
>>>>>>> for this code.
>>>>>>>
>>>>>>>> Speaking of which, is there a good way to edit and build
>> openJDK
>>>>>>> from eclipse?
>>>>>>>> Then this sort of embarrassing error could be avoided
>> (including
>>>>>> the
>>>>>>> printStats() call).
>>>>>>>
>>>>>>> I don't use Eclipse, sorry.  :-(
>>>>>>>
>>>>>>>>      As for Arrays.newSize()... I can't find it here:
>>>>>>>>
>> http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html
>>>>>>>> Is this a new function added in 7?
>>>>>>> Sorry, make that Arrays.copyOf(..., newSize).  I tried to type
>>>> the
>>>>>>> name
>>>>>>> from memory and got it wrong.
>>>>>>>
>>>>>>>>> 721 - Arrays.sort()
>>>>>>>>      I thought about using this, but I did some measurements,
>> and
>>>>>> it
>>>>>>> turns out that
>>>>>>>> Arrays.sort() is a bit slower if the portion of the array
>> being
>>>>>>> sorted has fewer
>>>>>>>> than about 70 elements.
>>>>>>> I wonder what the typical number of elements is.  Is this
>> sorting
>>>>>>> crossings per line?  Then simple primitives like circles should
>>>> only
>>>>>>> have 2 per line, right?  Is it worth testing for really small
>>>>>> numbers
>>>>>>> of
>>>>>>> elements (much lower than 70) and doing a manual sort?  Or am I
>>>>>>> misunderstanding what is being sorted there?
>>>>>>>
>>>>>>>>> How comfortable do you feel with that conversion?
>>>>>>>> I'll try to do it and include it in a new patch along with,
>>>>>>> hopefully, a better way
>>>>>>>> to iterate through strips, and especially crossings. Right now
>>>> all
>>>>>>> the iteration
>>>>>>>> state is saved in global variables. This is... not good. I
>> spent
>>>>>> far
>>>>>>> too much time last
>>>>>>>> week on bugs caused by this sort of thing. Ideally, any
>> members
>>>>>> that
>>>>>>> can't be put at
>>>>>>>> the top of a class (like our edge and crossing data) should be
>>>> put
>>>>>>> in classes of their own.
>>>>>>>
>>>>>>> That sounds good, but also consider doing it in separate stages
>>>> to
>>>>>>> reduce churn in the code reviewing (and you then have revs to
>> go
>>>>>> back
>>>>>>> and test which stage caused a probem if we find a bug later):
>>>>>>>
>>>>>>> - first get all on floats
>>>>>>> - then change strip management
>>>>>>> - then change to open coordinate intervals
>>>>>>> - (or vice versa)
>>>>>>>
>>>>>>>> Do you have any ideas about how to iterate through edges in a
>>>>>> strip
>>>>>>> without going through
>>>>>>>> every edge? I was thinking of maybe using some sort of tree to
>>>>>> split
>>>>>>> the drawing surface,
>>>>>>>> but I haven't given it much thought.
>>>>>>> If you look for something like the native code for
>>>>>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I
>>>> typically
>>>>>>> like to do edge setup and enumeration.
>>>>>>>
>>>>>>> That code uses the "half open interval" approach for both X and
>> Y
>>>>>>> intervals.
>>>>>>>
>>>>>>> I then sort the edge list by "leading Y" and then move through
>>>> the
>>>>>>> edges
>>>>>>> using the following manner (kind of like an inch worm eating
>> the
>>>>>>> edges,
>>>>>>> maintaining a pointer to the beginning and end of an "active
>>>> list"
>>>>>> of
>>>>>>> edges that are in play at any given time by virtue of having
>> their
>>>> Y
>>>>>>> range intersect the current sampling Y).  Note that I use an
>>>> array
>>>>>> of
>>>>>>> edges and then a parallel array of pointers into the edges so
>> that
>>>> I
>>>>>>> can
>>>>>>> sort just the pointers and avoid moving tons of edge data
>> around.
>>>>>>> Also,
>>>>>>> later when I eliminate edges from the active list I just move
>>>> their
>>>>>>> pointers into and out of view rather than having to copy the
>> edge
>>>>>>> data.
>>>>>>>    It is harder to do an array of pointers in Java, though -
>>>> perhaps
>>>>>> an
>>>>>>> array of indices?  Here is some basic pseudocode:
>>>>>>>
>>>>>>> start with lo and hi pointing at index 0 in edge list.
>>>>>>> until edge list is exhausted {
>>>>>>>       process edges between lo and hi (empty on first pass)
>>>>>>>       scan from hi to lo and toss any edges that are exhausted
>>>>>>>           (compacting remaining pointers towards hi)
>>>>>>>       keep incrementing hi to accept any new edges coming into
>>>> play
>>>>>>>       process edges between lo and hi to increment to the next
>> Y
>>>>>>>           (note that new edges from previous step may not
>>>>>>>            need any processing in this stage if their starting
>>>>>>>            Y equals the next Y to be sampled)
>>>>>>> }
>>>>>>>
>>>>>>> Gradually lo and hi make their way through the list.  The edges
>>>>>> above
>>>>>>> hi
>>>>>>> are always sorted in order of when they may come into play as
>> we
>>>>>> move
>>>>>>> downward in Y.  The edges between lo and hi are also usually
>> kept
>>>>>>> sorted
>>>>>>> by their current X so that the stage that processes them into
>>>> spans
>>>>>>> can
>>>>>>> just run through them.  The edges below lo are usually random
>>>>>> garbage
>>>>>>> because no care was taken during the "pruning" step to worry
>>>> about
>>>>>>> what
>>>>>>> happens to the pointer table down there as lo is incremented
>> (the
>>>>>>> algorithm only ever looks up the array of pointers).
>>>>>>>
>>>>>>> I hope that helps...
>>>>>>>
>>>>>>> 			...jim