[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Thu Jul 29 21:16:40 UTC 2010

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Hi Denis,
> 
> Only some minor comments so far:
> 
> Stroker.java:
> 
> - Should det be precomputed and saved?  (You calculate it in the 
> constructor anyway, but don't save it.)
> - Should test for uniform scale be precomputed?
> - (Is test for uniform scale too strict?  Can a rotated uniform scale
> 
> use the same code as upright uniform scale?)
> - Why are m00_2_m01_2 et al no longer precomputed (they only need to
> be 
> precomputed if scale is not uniform which isn't common)?
> - lineLength method is "user space length" isn't it? - a more 
> descriptive name might help avoid confusion if someone is modifying
> code 
> here later.
> - line 614 - missing space
> 
> Dasher.java:
> 
> - Line 187 - use "leftInThisDashSegment" here?
> 
> I still have to look at Renderer.java in depth, but I thought I'd send
> 
> these minor comments along while they are fresh in my email buffer...
> 
> 			...jim
> 
> Denis Lila wrote:
> > Hello.
> > 
> > And, here it is:
> http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/
> > 
> > Just as I thought, it's way faster than the original. I made a new
> > test, which is supposed to be more realistic than drawing 30000
> length
> > lines. It consists of splitting a 1000x1000 frame in 100 10x10
> squares
> > and in each square drawing 20 or so curves with random
> start/end/control
> > points in that square. In this case it's at least twice faster
> (~0.85
> > versus ~1.95 seconds).
> > 
> > Unfortunately I've had to do a lot of the same optimizations that I
> was
> > trying to remove (i.e. making everything a global variable). I
> tried
> > writing a higher level version of it, but it completely negated the
> > performance gains of the new algorithm. Anyway, it's still not as
> bad
> > as before because the algorithm is inherently clearer and we only
> iterate
> > through scan lines instead of iterating through strips and then
> scanlines
> > in that strip, and then edges, all in the same method.
> > 
> > It's also better organized, and logically separate parts of the code
> don't
> > really touch each other's variables much. And it's definitely
> better
> > commented. 
> > 
> > I also made some minor changes outside of Renderer that did not
> appear in
> > the first version of this patch last week:
> > 1. I fixed the issue Jim pointed out in Dasher, where if 
> > (origLen == leftInThisDashSegment) the dash index was not being
> incremented.
> > 2. In dasher, I no longer copy the dash array.
> > 3. I removed files PiscesMath.java and Transform4.java because they
> are no
> > longer needed.
> > 
> > Thanks,
> > Denis.
> > 
> > ----- "Jim Graham" <james.graham at oracle.com> wrote:
> > 
> >> Woohoo, Denis!  I look forward to seeing the new version!
> >>
> >> 		...jim
> >>
> >> On 7/28/2010 5:51 AM, Denis Lila wrote:
> >>> Hello Jim.
> >>>
> >>> This one performs almost identically to what is already there
> >>> in openjdk6 and 7, since it's exactly what I sent for review
> >>> last week, but with all the changes you suggested implemented.
> >>>
> >>> I would actually like to ask you to not look at either one of
> >>> them. First of all, there is an ArrayIndexOutOfBoundsException
> >>> possible in emitRow.
> >>> And secondly, even if there wasn't, last night I implemented
> >>> your algorithm from ShapeSpanIterator.c to iterate through
> >>> the edges. I have yet to debug it, but it makes everything much,
> >>> much simpler, and it should make it far faster, so we get the
> best
> >>> of both worlds.
> >>>
> >>> Thanks,
> >>> Denis.
> >>>
> >>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
> >>>
> >>>> Hi Denis,
> >>>>
> >>>> I'll try to get through both versions and see if I can find
> >> anything
> >>>> that was hurting performance with your EdgeLists.  I'm guessing
> >> that
> >>>> this version was created because of the performance issues you
> >> found
> >>>> with the EdgeList version?  Does this perform more closely to
> the
> >>>> existing code than the EdgeList version?
> >>>>
> >>>> 			...jim
> >>>>
> >>>> Denis Lila wrote:
> >>>>> Hello again.
> >>>>>
> >>>>> This attachmet is a "can of worms" implementation without all
> the
> >>>> fancy (and slow)
> >>>>> iteration. It also includes all of the other suggestions you
> sent
> >> in
> >>>> your first
> >>>>> review of Dasher and Renderer last week (most importantly, the
> >>>> firstOrientation
> >>>>> issue, horizontal lines filtering, and adding prefixes to
> >> variable
> >>>> names to make
> >>>>> it clear whether they refer to pixels, or subpixels).
> >>>>>
> >>>>> Regards,
> >>>>> Denis.
> >>>>>
> >>>>> ----- "Denis Lila"<dlila at redhat.com>  wrote:
> >>>>>
> >>>>>> Hello Jim.
> >>>>>>
> >>>>>> I implemented your "can of worms" idea. It works, and it got
> rid
> >>>> of
> >>>>>> the biasing.
> >>>>>> I wasn't able to send a webrev, but there are many changes and
> a
> >>>> side
> >>>>>> by side
> >>>>>> comparison would probably be useless, so I just attached the
> >> file.
> >>>> I
> >>>>>> hope this is
> >>>>>> ok.
> >>>>>>
> >>>>>> I also implemented a "better" iterating structure for the
> lines
> >>>> and
> >>>>>> the
> >>>>>> strips and crossings. I think it is better in every way,
> except
> >>>>>> performance.
> >>>>>> The new file is more than 200 lines smaller than the old one.
> >> The
> >>>> only
> >>>>>> members of Renderer are now the AA variables and the position
> >>>>>> variables
> >>>>>> (sx*, sy*, x*, y*).
> >>>>>> What I've done is I added an EdgeList class, which
> encapsulates
> >>>> all
> >>>>>> the edge
> >>>>>> related variables in the old Renderer. At first, I had an Edge
> >>>> class
> >>>>>> in addition
> >>>>>> to the EdgeList class, and while this was much nicer, it
> turned
> >> out
> >>>> to
> >>>>>> be too
> >>>>>> expensive (see last paragraph).
> >>>>>> I've also added a ScanLineIterator, so instead of
> _endRendering
> >>>>>> iterating
> >>>>>> through strips, and then calling renderStrip() which iterates
> >>>> through
> >>>>>> the
> >>>>>> scanlines in that strip, and then through the crossings in
> that
> >>>>>> scanline,
> >>>>>> what happens now is that _endRendering uses the
> >> Iterator<ScanLine>
> >>>> to
> >>>>>> iterate through each scanline, get get its crossings and
> iterate
> >>>>>> through them
> >>>>>> to accumulate the alpha. By the way, a ScanLine is a type
> >> defined
> >>>> by
> >>>>>> an
> >>>>>> interface which exports methods for getting the y coord of the
> >>>> line,
> >>>>>> the
> >>>>>> number of crossings in it, the ith crossing, and a method for
> >>>> sorting
> >>>>>> its crossings.
> >>>>>> The class that implements ScanLine is ScanLineIterator itself.
> I
> >>>> made
> >>>>>> a
> >>>>>> ScanLine class, but I was afraid performance would suffer
> >> because
> >>>> of
> >>>>>> all the
> >>>>>> object creations (this turned out not to be an issue, after I
> >>>> switched
> >>>>>> to the
> >>>>>> current way, and remeasured things). I did not switch back
> >> because
> >>>>>> this is
> >>>>>> only slightly worse.
> >>>>>>
> >>>>>> As for performance: I wrote a simple program that tries to
> draw
> >> a
> >>>>>> dashed path
> >>>>>> that consists of about 160 dashed lines of width 1 and length
> >>>> 30000,
> >>>>>> going
> >>>>>> from the centre of the frame to some point. On my machine,
> this
> >>>> takes
> >>>>>> about 4.9
> >>>>>> seconds in openjdk6, and 26 seconds using the attached file.
> >> Back
> >>>> when
> >>>>>> I was using
> >>>>>> the Edge class it took about 39 seconds. Everything without
> >> hundres
> >>>> of
> >>>>>> thousands of
> >>>>>> edges is not much slower
> >>>>>> I have not changed any of the algorithms. ScanLineIterator
> still
> >>>> goes
> >>>>>> through
> >>>>>> strips of the same size and computes crossings in every strip
> >>>> using
> >>>>>> the same
> >>>>>> method as before, so I don't know why it's so slow. It can't
> be
> >>>>>> because of anything
> >>>>>> happening in _endRendering, because there are only about 9000
> >>>>>> scanlines and for each
> >>>>>> of them I've just added a few calls to one line getters (which
> >> used
> >>>> to
> >>>>>> be direct
> >>>>>> accesses into arrays).
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Denis.
> >>>>>>
> >>>>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
> >>>>>>
> >>>>>>> Denis Lila wrote:
> >>>>>>>> Hello Jim.
> >>>>>>>>
> >>>>>>>> Thank you very much for taking the time to read through
> this.
> >>>>>>>>
> >>>>>>>>> 169 - if origLen reaches the end of the dash exactly (the
> >> "=="
> >>>>>>> case)
> >>>>>>>>      You're right, I should. I can't just replace<= with ==
> >>>>>> though,
> >>>>>>>> because the results will be the same: in the equal case
> >> origLen
> >>>>>>> will
> >>>>>>>> become 0, and on the next iteration, the (origLen<
> >>>>>> dash[idx]-phase)
> >>>>>>> will
> >>>>>>>> be true, and we will do a goTo(x1,y1), which is what we just
> >> did
> >>>>>> in
> >>>>>>> the
> >>>>>>>> previous iteration (unless dash[idx] is 0, in which case the
> >>>>>> results
> >>>>>>>> will be even worse). The best solution to this is to just do
> a
> >>>>>>> nested
> >>>>>>>> check for the == case.
> >>>>>>> Ah, right - because there is no "break" when origLen becomes
> >>>> zero.
> >>>>>>> Sounds like you're on it.
> >>>>>>>
> >>>>>>>>> 171 - Aren't x0,y0 stored as subpix values?  You would then
> >> be
> >>>>>>> comparing
> >>>>>>>>> a subpix value to a non-subpix value.  Perhaps if the
> subpix
> >>>>>> calls
> >>>>>>> are
> >>>>>>>>> moved to the top of the function I think this should work
> OK?
> >>>>>>>>      That's true, they are. This is very puzzling. If a
> >>>> horizontal
> >>>>>>> line is
> >>>>>>>> added, when the crossings for it are being computed, dxBydy
> >>>> should
> >>>>>>> be NaN, and
> >>>>>>>> wouldn't an error be thrown when we try to cast to an int in
> >> the
> >>>>>>> call to addCrossing?
> >>>>>>>
> >>>>>>> I'm not sure - I didn't trace it through very far - I just
> >> noted
> >>>>>> that
> >>>>>>> the values were likely in different "resolutions".
> >>>>>>>
> >>>>>>>>> 194,197 - Shouldn't these be constants, or based on the
> >>>>>>> SUB_POS_XY?
> >>>>>>>>      I suppose I should make a biasing constant. I don't
> think
> >>>> they
> >>>>>>> should be based
> >>>>>>>> on SUB_POS_XY though, because the biasing is done to
> subpixel
> >>>>>>> coordinates so
> >>>>>>>> there is no danger that if our supersampling is fine enough
> >> the
> >>>>>>> biasing will
> >>>>>>>> make the coordinates jump over some scan line.
> >>>>>>> I'm guessing you punted on my "can of worms" suggestion then.
> >>>> ;-)
> >>>>>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then
> you
> >>>>>> don't
> >>>>>>> have
> >>>>>>>>> to "unsubpix" them here.  (Though you still need the subpix
> >> sy0
> >>>>>> for
> >>>>>>> line
> >>>>>>>>> 209 - or you could just call subpix on it for that line and
> >> at
> >>>>>>> least
> >>>>>>>>> you'd save 1 call to sub/unsub).
> >>>>>>>>      Ok. I'll just save subpixed and unsubpixed versions of
> >> sx0,
> >>>>>> sy0.
> >>>>>>> That should
> >>>>>>>> eliminate all sx0,sy0 related calls to tosubpix and topix
> >> except
> >>>>>> in
> >>>>>>> moveTo.
> >>>>>>>
> >>>>>>> and lineTo.  You may only need 3 of those values, though, if
> I
> >>>>>>> remember
> >>>>>>> my code reading well enough.
> >>>>>>>
> >>>>>>>>> 256,264 - casting to int is problematic.  It truncates
> >> towards
> >>>> 0
> >>>>>>> which
> >>>>>>>>> means negatives are ceil'd and positives are floor'd.  It
> >> would
> >>>>>> be
> >>>>>>> best
> >>>>>>>>> to use floor here instead.  On the other hand, if negative
> >>>>>> numbers
> >>>>>>> are
> >>>>>>>>> always "off the left side of the drawable" then this is
> moot.
> >>>>>>>>      That's why I left it at int casting. Do you still think
> I
> >>>>>> should
> >>>>>>> change it
> >>>>>>>> to floor?
> >>>>>>> If you mean floor, I think it best to use floor.  Unless you
> >> can
> >>>>>> prove
> >>>>>>> that negatives aren't really an issue and that the strange
> >>>>>> truncation
> >>>>>>> on
> >>>>>>> them won't be numerically a problem - but I don't think it is
> >>>> worth
> >>>>>> it
> >>>>>>> for this code.
> >>>>>>>
> >>>>>>>> Speaking of which, is there a good way to edit and build
> >> openJDK
> >>>>>>> from eclipse?
> >>>>>>>> Then this sort of embarrassing error could be avoided
> >> (including
> >>>>>> the
> >>>>>>> printStats() call).
> >>>>>>>
> >>>>>>> I don't use Eclipse, sorry.  :-(
> >>>>>>>
> >>>>>>>>      As for Arrays.newSize()... I can't find it here:
> >>>>>>>>
> >>
> http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html
> >>>>>>>> Is this a new function added in 7?
> >>>>>>> Sorry, make that Arrays.copyOf(..., newSize).  I tried to
> type
> >>>> the
> >>>>>>> name
> >>>>>>> from memory and got it wrong.
> >>>>>>>
> >>>>>>>>> 721 - Arrays.sort()
> >>>>>>>>      I thought about using this, but I did some
> measurements,
> >> and
> >>>>>> it
> >>>>>>> turns out that
> >>>>>>>> Arrays.sort() is a bit slower if the portion of the array
> >> being
> >>>>>>> sorted has fewer
> >>>>>>>> than about 70 elements.
> >>>>>>> I wonder what the typical number of elements is.  Is this
> >> sorting
> >>>>>>> crossings per line?  Then simple primitives like circles
> should
> >>>> only
> >>>>>>> have 2 per line, right?  Is it worth testing for really small
> >>>>>> numbers
> >>>>>>> of
> >>>>>>> elements (much lower than 70) and doing a manual sort?  Or am
> I
> >>>>>>> misunderstanding what is being sorted there?
> >>>>>>>
> >>>>>>>>> How comfortable do you feel with that conversion?
> >>>>>>>> I'll try to do it and include it in a new patch along with,
> >>>>>>> hopefully, a better way
> >>>>>>>> to iterate through strips, and especially crossings. Right
> now
> >>>> all
> >>>>>>> the iteration
> >>>>>>>> state is saved in global variables. This is... not good. I
> >> spent
> >>>>>> far
> >>>>>>> too much time last
> >>>>>>>> week on bugs caused by this sort of thing. Ideally, any
> >> members
> >>>>>> that
> >>>>>>> can't be put at
> >>>>>>>> the top of a class (like our edge and crossing data) should
> be
> >>>> put
> >>>>>>> in classes of their own.
> >>>>>>>
> >>>>>>> That sounds good, but also consider doing it in separate
> stages
> >>>> to
> >>>>>>> reduce churn in the code reviewing (and you then have revs to
> >> go
> >>>>>> back
> >>>>>>> and test which stage caused a probem if we find a bug later):
> >>>>>>>
> >>>>>>> - first get all on floats
> >>>>>>> - then change strip management
> >>>>>>> - then change to open coordinate intervals
> >>>>>>> - (or vice versa)
> >>>>>>>
> >>>>>>>> Do you have any ideas about how to iterate through edges in
> a
> >>>>>> strip
> >>>>>>> without going through
> >>>>>>>> every edge? I was thinking of maybe using some sort of tree
> to
> >>>>>> split
> >>>>>>> the drawing surface,
> >>>>>>>> but I haven't given it much thought.
> >>>>>>> If you look for something like the native code for
> >>>>>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I
> >>>> typically
> >>>>>>> like to do edge setup and enumeration.
> >>>>>>>
> >>>>>>> That code uses the "half open interval" approach for both X
> and
> >> Y
> >>>>>>> intervals.
> >>>>>>>
> >>>>>>> I then sort the edge list by "leading Y" and then move
> through
> >>>> the
> >>>>>>> edges
> >>>>>>> using the following manner (kind of like an inch worm eating
> >> the
> >>>>>>> edges,
> >>>>>>> maintaining a pointer to the beginning and end of an "active
> >>>> list"
> >>>>>> of
> >>>>>>> edges that are in play at any given time by virtue of having
> >> their
> >>>> Y
> >>>>>>> range intersect the current sampling Y).  Note that I use an
> >>>> array
> >>>>>> of
> >>>>>>> edges and then a parallel array of pointers into the edges so
> >> that
> >>>> I
> >>>>>>> can
> >>>>>>> sort just the pointers and avoid moving tons of edge data
> >> around.
> >>>>>>> Also,
> >>>>>>> later when I eliminate edges from the active list I just move
> >>>> their
> >>>>>>> pointers into and out of view rather than having to copy the
> >> edge
> >>>>>>> data.
> >>>>>>>    It is harder to do an array of pointers in Java, though -
> >>>> perhaps
> >>>>>> an
> >>>>>>> array of indices?  Here is some basic pseudocode:
> >>>>>>>
> >>>>>>> start with lo and hi pointing at index 0 in edge list.
> >>>>>>> until edge list is exhausted {
> >>>>>>>       process edges between lo and hi (empty on first pass)
> >>>>>>>       scan from hi to lo and toss any edges that are
> exhausted
> >>>>>>>           (compacting remaining pointers towards hi)
> >>>>>>>       keep incrementing hi to accept any new edges coming
> into
> >>>> play
> >>>>>>>       process edges between lo and hi to increment to the
> next
> >> Y
> >>>>>>>           (note that new edges from previous step may not
> >>>>>>>            need any processing in this stage if their
> starting
> >>>>>>>            Y equals the next Y to be sampled)
> >>>>>>> }
> >>>>>>>
> >>>>>>> Gradually lo and hi make their way through the list.  The
> edges
> >>>>>> above
> >>>>>>> hi
> >>>>>>> are always sorted in order of when they may come into play as
> >> we
> >>>>>> move
> >>>>>>> downward in Y.  The edges between lo and hi are also usually
> >> kept
> >>>>>>> sorted
> >>>>>>> by their current X so that the stage that processes them into
> >>>> spans
> >>>>>>> can
> >>>>>>> just run through them.  The edges below lo are usually random
> >>>>>> garbage
> >>>>>>> because no care was taken during the "pruning" step to worry
> >>>> about
> >>>>>>> what
> >>>>>>> happens to the pointer table down there as lo is incremented
> >> (the
> >>>>>>> algorithm only ever looks up the array of pointers).
> >>>>>>>
> >>>>>>> I hope that helps...
> >>>>>>>
> >>>>>>> 			...jim