[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Denis Lila dlila at redhat.com
Wed Jul 28 23:50:01 UTC 2010


Hello.

And, here it is: http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/

Just as I thought, it's way faster than the original. I made a new
test, which is supposed to be more realistic than drawing 30000 length
lines. It consists of splitting a 1000x1000 frame in 100 10x10 squares
and in each square drawing 20 or so curves with random start/end/control
points in that square. In this case it's at least twice faster (~0.85
versus ~1.95 seconds).

Unfortunately I've had to do a lot of the same optimizations that I was
trying to remove (i.e. making everything a global variable). I tried
writing a higher level version of it, but it completely negated the
performance gains of the new algorithm. Anyway, it's still not as bad
as before because the algorithm is inherently clearer and we only iterate
through scan lines instead of iterating through strips and then scanlines
in that strip, and then edges, all in the same method.

It's also better organized, and logically separate parts of the code don't
really touch each other's variables much. And it's definitely better
commented. 

I also made some minor changes outside of Renderer that did not appear in
the first version of this patch last week:
1. I fixed the issue Jim pointed out in Dasher, where if 
(origLen == leftInThisDashSegment) the dash index was not being incremented.
2. In dasher, I no longer copy the dash array.
3. I removed files PiscesMath.java and Transform4.java because they are no
longer needed.

Thanks,
Denis.

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Woohoo, Denis!  I look forward to seeing the new version!
> 
> 		...jim
> 
> On 7/28/2010 5:51 AM, Denis Lila wrote:
> > Hello Jim.
> >
> > This one performs almost identically to what is already there
> > in openjdk6 and 7, since it's exactly what I sent for review
> > last week, but with all the changes you suggested implemented.
> >
> > I would actually like to ask you to not look at either one of
> > them. First of all, there is an ArrayIndexOutOfBoundsException
> > possible in emitRow.
> > And secondly, even if there wasn't, last night I implemented
> > your algorithm from ShapeSpanIterator.c to iterate through
> > the edges. I have yet to debug it, but it makes everything much,
> > much simpler, and it should make it far faster, so we get the best
> > of both worlds.
> >
> > Thanks,
> > Denis.
> >
> > ----- "Jim Graham"<james.graham at oracle.com>  wrote:
> >
> >> Hi Denis,
> >>
> >> I'll try to get through both versions and see if I can find
> anything
> >> that was hurting performance with your EdgeLists.  I'm guessing
> that
> >> this version was created because of the performance issues you
> found
> >> with the EdgeList version?  Does this perform more closely to the
> >> existing code than the EdgeList version?
> >>
> >> 			...jim
> >>
> >> Denis Lila wrote:
> >>> Hello again.
> >>>
> >>> This attachmet is a "can of worms" implementation without all the
> >> fancy (and slow)
> >>> iteration. It also includes all of the other suggestions you sent
> in
> >> your first
> >>> review of Dasher and Renderer last week (most importantly, the
> >> firstOrientation
> >>> issue, horizontal lines filtering, and adding prefixes to
> variable
> >> names to make
> >>> it clear whether they refer to pixels, or subpixels).
> >>>
> >>> Regards,
> >>> Denis.
> >>>
> >>> ----- "Denis Lila"<dlila at redhat.com>  wrote:
> >>>
> >>>> Hello Jim.
> >>>>
> >>>> I implemented your "can of worms" idea. It works, and it got rid
> >> of
> >>>> the biasing.
> >>>> I wasn't able to send a webrev, but there are many changes and a
> >> side
> >>>> by side
> >>>> comparison would probably be useless, so I just attached the
> file.
> >> I
> >>>> hope this is
> >>>> ok.
> >>>>
> >>>> I also implemented a "better" iterating structure for the lines
> >> and
> >>>> the
> >>>> strips and crossings. I think it is better in every way, except
> >>>> performance.
> >>>> The new file is more than 200 lines smaller than the old one.
> The
> >> only
> >>>> members of Renderer are now the AA variables and the position
> >>>> variables
> >>>> (sx*, sy*, x*, y*).
> >>>> What I've done is I added an EdgeList class, which encapsulates
> >> all
> >>>> the edge
> >>>> related variables in the old Renderer. At first, I had an Edge
> >> class
> >>>> in addition
> >>>> to the EdgeList class, and while this was much nicer, it turned
> out
> >> to
> >>>> be too
> >>>> expensive (see last paragraph).
> >>>> I've also added a ScanLineIterator, so instead of _endRendering
> >>>> iterating
> >>>> through strips, and then calling renderStrip() which iterates
> >> through
> >>>> the
> >>>> scanlines in that strip, and then through the crossings in that
> >>>> scanline,
> >>>> what happens now is that _endRendering uses the
> Iterator<ScanLine>
> >> to
> >>>> iterate through each scanline, get get its crossings and iterate
> >>>> through them
> >>>> to accumulate the alpha. By the way, a ScanLine is a type
> defined
> >> by
> >>>> an
> >>>> interface which exports methods for getting the y coord of the
> >> line,
> >>>> the
> >>>> number of crossings in it, the ith crossing, and a method for
> >> sorting
> >>>> its crossings.
> >>>> The class that implements ScanLine is ScanLineIterator itself. I
> >> made
> >>>> a
> >>>> ScanLine class, but I was afraid performance would suffer
> because
> >> of
> >>>> all the
> >>>> object creations (this turned out not to be an issue, after I
> >> switched
> >>>> to the
> >>>> current way, and remeasured things). I did not switch back
> because
> >>>> this is
> >>>> only slightly worse.
> >>>>
> >>>> As for performance: I wrote a simple program that tries to draw
> a
> >>>> dashed path
> >>>> that consists of about 160 dashed lines of width 1 and length
> >> 30000,
> >>>> going
> >>>> from the centre of the frame to some point. On my machine, this
> >> takes
> >>>> about 4.9
> >>>> seconds in openjdk6, and 26 seconds using the attached file.
> Back
> >> when
> >>>> I was using
> >>>> the Edge class it took about 39 seconds. Everything without
> hundres
> >> of
> >>>> thousands of
> >>>> edges is not much slower
> >>>> I have not changed any of the algorithms. ScanLineIterator still
> >> goes
> >>>> through
> >>>> strips of the same size and computes crossings in every strip
> >> using
> >>>> the same
> >>>> method as before, so I don't know why it's so slow. It can't be
> >>>> because of anything
> >>>> happening in _endRendering, because there are only about 9000
> >>>> scanlines and for each
> >>>> of them I've just added a few calls to one line getters (which
> used
> >> to
> >>>> be direct
> >>>> accesses into arrays).
> >>>>
> >>>> Thanks,
> >>>> Denis.
> >>>>
> >>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
> >>>>
> >>>>> Denis Lila wrote:
> >>>>>> Hello Jim.
> >>>>>>
> >>>>>> Thank you very much for taking the time to read through this.
> >>>>>>
> >>>>>>> 169 - if origLen reaches the end of the dash exactly (the
> "=="
> >>>>> case)
> >>>>>>      You're right, I should. I can't just replace<= with ==
> >>>> though,
> >>>>>> because the results will be the same: in the equal case
> origLen
> >>>>> will
> >>>>>> become 0, and on the next iteration, the (origLen<
> >>>> dash[idx]-phase)
> >>>>> will
> >>>>>> be true, and we will do a goTo(x1,y1), which is what we just
> did
> >>>> in
> >>>>> the
> >>>>>> previous iteration (unless dash[idx] is 0, in which case the
> >>>> results
> >>>>>> will be even worse). The best solution to this is to just do a
> >>>>> nested
> >>>>>> check for the == case.
> >>>>> Ah, right - because there is no "break" when origLen becomes
> >> zero.
> >>>>> Sounds like you're on it.
> >>>>>
> >>>>>>> 171 - Aren't x0,y0 stored as subpix values?  You would then
> be
> >>>>> comparing
> >>>>>>> a subpix value to a non-subpix value.  Perhaps if the subpix
> >>>> calls
> >>>>> are
> >>>>>>> moved to the top of the function I think this should work OK?
> >>>>>>      That's true, they are. This is very puzzling. If a
> >> horizontal
> >>>>> line is
> >>>>>> added, when the crossings for it are being computed, dxBydy
> >> should
> >>>>> be NaN, and
> >>>>>> wouldn't an error be thrown when we try to cast to an int in
> the
> >>>>> call to addCrossing?
> >>>>>
> >>>>> I'm not sure - I didn't trace it through very far - I just
> noted
> >>>> that
> >>>>> the values were likely in different "resolutions".
> >>>>>
> >>>>>>> 194,197 - Shouldn't these be constants, or based on the
> >>>>> SUB_POS_XY?
> >>>>>>      I suppose I should make a biasing constant. I don't think
> >> they
> >>>>> should be based
> >>>>>> on SUB_POS_XY though, because the biasing is done to subpixel
> >>>>> coordinates so
> >>>>>> there is no danger that if our supersampling is fine enough
> the
> >>>>> biasing will
> >>>>>> make the coordinates jump over some scan line.
> >>>>> I'm guessing you punted on my "can of worms" suggestion then.
> >> ;-)
> >>>>>
> >>>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then you
> >>>> don't
> >>>>> have
> >>>>>>> to "unsubpix" them here.  (Though you still need the subpix
> sy0
> >>>> for
> >>>>> line
> >>>>>>> 209 - or you could just call subpix on it for that line and
> at
> >>>>> least
> >>>>>>> you'd save 1 call to sub/unsub).
> >>>>>>      Ok. I'll just save subpixed and unsubpixed versions of
> sx0,
> >>>> sy0.
> >>>>> That should
> >>>>>> eliminate all sx0,sy0 related calls to tosubpix and topix
> except
> >>>> in
> >>>>> moveTo.
> >>>>>
> >>>>> and lineTo.  You may only need 3 of those values, though, if I
> >>>>> remember
> >>>>> my code reading well enough.
> >>>>>
> >>>>>>> 256,264 - casting to int is problematic.  It truncates
> towards
> >> 0
> >>>>> which
> >>>>>>> means negatives are ceil'd and positives are floor'd.  It
> would
> >>>> be
> >>>>> best
> >>>>>>> to use floor here instead.  On the other hand, if negative
> >>>> numbers
> >>>>> are
> >>>>>>> always "off the left side of the drawable" then this is moot.
> >>>>>>      That's why I left it at int casting. Do you still think I
> >>>> should
> >>>>> change it
> >>>>>> to floor?
> >>>>> If you mean floor, I think it best to use floor.  Unless you
> can
> >>>> prove
> >>>>> that negatives aren't really an issue and that the strange
> >>>> truncation
> >>>>> on
> >>>>> them won't be numerically a problem - but I don't think it is
> >> worth
> >>>> it
> >>>>> for this code.
> >>>>>
> >>>>>> Speaking of which, is there a good way to edit and build
> openJDK
> >>>>> from eclipse?
> >>>>>> Then this sort of embarrassing error could be avoided
> (including
> >>>> the
> >>>>> printStats() call).
> >>>>>
> >>>>> I don't use Eclipse, sorry.  :-(
> >>>>>
> >>>>>>      As for Arrays.newSize()... I can't find it here:
> >>>>>>
> >>>>
> >>
> http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html
> >>>>>> Is this a new function added in 7?
> >>>>> Sorry, make that Arrays.copyOf(..., newSize).  I tried to type
> >> the
> >>>>> name
> >>>>> from memory and got it wrong.
> >>>>>
> >>>>>>> 721 - Arrays.sort()
> >>>>>>      I thought about using this, but I did some measurements,
> and
> >>>> it
> >>>>> turns out that
> >>>>>> Arrays.sort() is a bit slower if the portion of the array
> being
> >>>>> sorted has fewer
> >>>>>> than about 70 elements.
> >>>>> I wonder what the typical number of elements is.  Is this
> sorting
> >>>>> crossings per line?  Then simple primitives like circles should
> >> only
> >>>>> have 2 per line, right?  Is it worth testing for really small
> >>>> numbers
> >>>>> of
> >>>>> elements (much lower than 70) and doing a manual sort?  Or am I
> >>>>> misunderstanding what is being sorted there?
> >>>>>
> >>>>>>> How comfortable do you feel with that conversion?
> >>>>>> I'll try to do it and include it in a new patch along with,
> >>>>> hopefully, a better way
> >>>>>> to iterate through strips, and especially crossings. Right now
> >> all
> >>>>> the iteration
> >>>>>> state is saved in global variables. This is... not good. I
> spent
> >>>> far
> >>>>> too much time last
> >>>>>> week on bugs caused by this sort of thing. Ideally, any
> members
> >>>> that
> >>>>> can't be put at
> >>>>>> the top of a class (like our edge and crossing data) should be
> >> put
> >>>>> in classes of their own.
> >>>>>
> >>>>> That sounds good, but also consider doing it in separate stages
> >> to
> >>>>> reduce churn in the code reviewing (and you then have revs to
> go
> >>>> back
> >>>>> and test which stage caused a probem if we find a bug later):
> >>>>>
> >>>>> - first get all on floats
> >>>>> - then change strip management
> >>>>> - then change to open coordinate intervals
> >>>>> - (or vice versa)
> >>>>>
> >>>>>> Do you have any ideas about how to iterate through edges in a
> >>>> strip
> >>>>> without going through
> >>>>>> every edge? I was thinking of maybe using some sort of tree to
> >>>> split
> >>>>> the drawing surface,
> >>>>>> but I haven't given it much thought.
> >>>>> If you look for something like the native code for
> >>>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I
> >> typically
> >>>>> like to do edge setup and enumeration.
> >>>>>
> >>>>> That code uses the "half open interval" approach for both X and
> Y
> >>>>> intervals.
> >>>>>
> >>>>> I then sort the edge list by "leading Y" and then move through
> >> the
> >>>>> edges
> >>>>> using the following manner (kind of like an inch worm eating
> the
> >>>>> edges,
> >>>>> maintaining a pointer to the beginning and end of an "active
> >> list"
> >>>> of
> >>>>> edges that are in play at any given time by virtue of having
> their
> >> Y
> >>>>> range intersect the current sampling Y).  Note that I use an
> >> array
> >>>> of
> >>>>> edges and then a parallel array of pointers into the edges so
> that
> >> I
> >>>>> can
> >>>>> sort just the pointers and avoid moving tons of edge data
> around.
> >>>>> Also,
> >>>>> later when I eliminate edges from the active list I just move
> >> their
> >>>>> pointers into and out of view rather than having to copy the
> edge
> >>>>> data.
> >>>>>    It is harder to do an array of pointers in Java, though -
> >> perhaps
> >>>> an
> >>>>> array of indices?  Here is some basic pseudocode:
> >>>>>
> >>>>> start with lo and hi pointing at index 0 in edge list.
> >>>>> until edge list is exhausted {
> >>>>>       process edges between lo and hi (empty on first pass)
> >>>>>       scan from hi to lo and toss any edges that are exhausted
> >>>>>           (compacting remaining pointers towards hi)
> >>>>>       keep incrementing hi to accept any new edges coming into
> >> play
> >>>>>       process edges between lo and hi to increment to the next
> Y
> >>>>>           (note that new edges from previous step may not
> >>>>>            need any processing in this stage if their starting
> >>>>>            Y equals the next Y to be sampled)
> >>>>> }
> >>>>>
> >>>>> Gradually lo and hi make their way through the list.  The edges
> >>>> above
> >>>>> hi
> >>>>> are always sorted in order of when they may come into play as
> we
> >>>> move
> >>>>> downward in Y.  The edges between lo and hi are also usually
> kept
> >>>>> sorted
> >>>>> by their current X so that the stage that processes them into
> >> spans
> >>>>> can
> >>>>> just run through them.  The edges below lo are usually random
> >>>> garbage
> >>>>> because no care was taken during the "pruning" step to worry
> >> about
> >>>>> what
> >>>>> happens to the pointer table down there as lo is incremented
> (the
> >>>>> algorithm only ever looks up the array of pointers).
> >>>>>
> >>>>> I hope that helps...
> >>>>>
> >>>>> 			...jim



More information about the 2d-dev mailing list