[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Tue Jul 27 23:20:13 UTC 2010

Hi Denis,

I'll try to get through both versions and see if I can find anything 
that was hurting performance with your EdgeLists.  I'm guessing that 
this version was created because of the performance issues you found 
with the EdgeList version?  Does this perform more closely to the 
existing code than the EdgeList version?

			...jim

Denis Lila wrote:
> Hello again.
> 
> This attachmet is a "can of worms" implementation without all the fancy (and slow)
> iteration. It also includes all of the other suggestions you sent in your first
> review of Dasher and Renderer last week (most importantly, the firstOrientation
> issue, horizontal lines filtering, and adding prefixes to variable names to make
> it clear whether they refer to pixels, or subpixels).
> 
> Regards,
> Denis.
> 
> ----- "Denis Lila" <dlila at redhat.com> wrote:
> 
>> Hello Jim.
>>
>> I implemented your "can of worms" idea. It works, and it got rid of
>> the biasing.
>> I wasn't able to send a webrev, but there are many changes and a side
>> by side
>> comparison would probably be useless, so I just attached the file. I
>> hope this is
>> ok.
>>
>> I also implemented a "better" iterating structure for the lines and
>> the
>> strips and crossings. I think it is better in every way, except
>> performance.
>> The new file is more than 200 lines smaller than the old one. The only
>> members of Renderer are now the AA variables and the position
>> variables
>> (sx*, sy*, x*, y*).
>> What I've done is I added an EdgeList class, which encapsulates all
>> the edge
>> related variables in the old Renderer. At first, I had an Edge class
>> in addition
>> to the EdgeList class, and while this was much nicer, it turned out to
>> be too
>> expensive (see last paragraph).
>> I've also added a ScanLineIterator, so instead of _endRendering
>> iterating
>> through strips, and then calling renderStrip() which iterates through
>> the
>> scanlines in that strip, and then through the crossings in that
>> scanline,
>> what happens now is that _endRendering uses the Iterator<ScanLine> to
>> iterate through each scanline, get get its crossings and iterate
>> through them
>> to accumulate the alpha. By the way, a ScanLine is a type defined by
>> an
>> interface which exports methods for getting the y coord of the line,
>> the
>> number of crossings in it, the ith crossing, and a method for sorting
>> its crossings.
>> The class that implements ScanLine is ScanLineIterator itself. I made
>> a
>> ScanLine class, but I was afraid performance would suffer because of
>> all the
>> object creations (this turned out not to be an issue, after I switched
>> to the
>> current way, and remeasured things). I did not switch back because
>> this is
>> only slightly worse.
>>
>> As for performance: I wrote a simple program that tries to draw a
>> dashed path
>> that consists of about 160 dashed lines of width 1 and length 30000,
>> going
>> from the centre of the frame to some point. On my machine, this takes
>> about 4.9
>> seconds in openjdk6, and 26 seconds using the attached file. Back when
>> I was using
>> the Edge class it took about 39 seconds. Everything without hundres of
>> thousands of
>> edges is not much slower
>> I have not changed any of the algorithms. ScanLineIterator still goes
>> through
>> strips of the same size and computes crossings in every strip using
>> the same
>> method as before, so I don't know why it's so slow. It can't be
>> because of anything
>> happening in _endRendering, because there are only about 9000
>> scanlines and for each
>> of them I've just added a few calls to one line getters (which used to
>> be direct
>> accesses into arrays).
>>
>> Thanks,
>> Denis.
>>
>> ----- "Jim Graham" <james.graham at oracle.com> wrote:
>>
>>> Denis Lila wrote:
>>>> Hello Jim.
>>>>
>>>> Thank you very much for taking the time to read through this.
>>>>
>>>>> 169 - if origLen reaches the end of the dash exactly (the "=="
>>> case)
>>>>     You're right, I should. I can't just replace <= with ==
>> though,
>>>> because the results will be the same: in the equal case origLen
>>> will
>>>> become 0, and on the next iteration, the (origLen <
>> dash[idx]-phase)
>>> will
>>>> be true, and we will do a goTo(x1,y1), which is what we just did
>> in
>>> the
>>>> previous iteration (unless dash[idx] is 0, in which case the
>> results
>>>> will be even worse). The best solution to this is to just do a
>>> nested
>>>> check for the == case.
>>> Ah, right - because there is no "break" when origLen becomes zero.
>>> Sounds like you're on it.
>>>
>>>>> 171 - Aren't x0,y0 stored as subpix values?  You would then be
>>> comparing
>>>>> a subpix value to a non-subpix value.  Perhaps if the subpix
>> calls
>>> are
>>>>> moved to the top of the function I think this should work OK?
>>>>     That's true, they are. This is very puzzling. If a horizontal
>>> line is
>>>> added, when the crossings for it are being computed, dxBydy should
>>> be NaN, and
>>>> wouldn't an error be thrown when we try to cast to an int in the
>>> call to addCrossing?
>>>
>>> I'm not sure - I didn't trace it through very far - I just noted
>> that
>>> the values were likely in different "resolutions".
>>>
>>>>> 194,197 - Shouldn't these be constants, or based on the
>>> SUB_POS_XY?
>>>>     I suppose I should make a biasing constant. I don't think they
>>> should be based
>>>> on SUB_POS_XY though, because the biasing is done to subpixel
>>> coordinates so
>>>> there is no danger that if our supersampling is fine enough the
>>> biasing will
>>>> make the coordinates jump over some scan line.
>>> I'm guessing you punted on my "can of worms" suggestion then.  ;-)
>>>
>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then you
>> don't
>>> have
>>>>> to "unsubpix" them here.  (Though you still need the subpix sy0
>> for
>>> line
>>>>> 209 - or you could just call subpix on it for that line and at
>>> least
>>>>> you'd save 1 call to sub/unsub).
>>>>     Ok. I'll just save subpixed and unsubpixed versions of sx0,
>> sy0.
>>> That should
>>>> eliminate all sx0,sy0 related calls to tosubpix and topix except
>> in
>>> moveTo.
>>>
>>> and lineTo.  You may only need 3 of those values, though, if I
>>> remember
>>> my code reading well enough.
>>>
>>>>> 256,264 - casting to int is problematic.  It truncates towards 0
>>> which
>>>>> means negatives are ceil'd and positives are floor'd.  It would
>> be
>>> best
>>>>> to use floor here instead.  On the other hand, if negative
>> numbers
>>> are
>>>>> always "off the left side of the drawable" then this is moot.
>>>>     That's why I left it at int casting. Do you still think I
>> should
>>> change it
>>>> to floor?
>>> If you mean floor, I think it best to use floor.  Unless you can
>> prove
>>> that negatives aren't really an issue and that the strange
>> truncation
>>> on
>>> them won't be numerically a problem - but I don't think it is worth
>> it
>>> for this code.
>>>
>>>> Speaking of which, is there a good way to edit and build openJDK
>>> from eclipse?
>>>> Then this sort of embarrassing error could be avoided (including
>> the
>>> printStats() call).
>>>
>>> I don't use Eclipse, sorry.  :-(
>>>
>>>>     As for Arrays.newSize()... I can't find it here:
>>>>
>> http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html
>>>> Is this a new function added in 7?
>>> Sorry, make that Arrays.copyOf(..., newSize).  I tried to type the
>>> name
>>> from memory and got it wrong.
>>>
>>>>> 721 - Arrays.sort()
>>>>     I thought about using this, but I did some measurements, and
>> it
>>> turns out that
>>>> Arrays.sort() is a bit slower if the portion of the array being
>>> sorted has fewer
>>>> than about 70 elements.
>>> I wonder what the typical number of elements is.  Is this sorting
>>> crossings per line?  Then simple primitives like circles should only
>>> have 2 per line, right?  Is it worth testing for really small
>> numbers
>>> of
>>> elements (much lower than 70) and doing a manual sort?  Or am I
>>> misunderstanding what is being sorted there?
>>>
>>>>> How comfortable do you feel with that conversion?
>>>> I'll try to do it and include it in a new patch along with,
>>> hopefully, a better way
>>>> to iterate through strips, and especially crossings. Right now all
>>> the iteration
>>>> state is saved in global variables. This is... not good. I spent
>> far
>>> too much time last
>>>> week on bugs caused by this sort of thing. Ideally, any members
>> that
>>> can't be put at
>>>> the top of a class (like our edge and crossing data) should be put
>>> in classes of their own.
>>>
>>> That sounds good, but also consider doing it in separate stages to
>>> reduce churn in the code reviewing (and you then have revs to go
>> back
>>> and test which stage caused a probem if we find a bug later):
>>>
>>> - first get all on floats
>>> - then change strip management
>>> - then change to open coordinate intervals
>>> - (or vice versa)
>>>
>>>> Do you have any ideas about how to iterate through edges in a
>> strip
>>> without going through
>>>> every edge? I was thinking of maybe using some sort of tree to
>> split
>>> the drawing surface,
>>>> but I haven't given it much thought.
>>> If you look for something like the native code for
>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I typically
>>> like to do edge setup and enumeration.
>>>
>>> That code uses the "half open interval" approach for both X and Y
>>> intervals.
>>>
>>> I then sort the edge list by "leading Y" and then move through the
>>> edges
>>> using the following manner (kind of like an inch worm eating the
>>> edges,
>>> maintaining a pointer to the beginning and end of an "active list"
>> of
>>> edges that are in play at any given time by virtue of having their Y
>>> range intersect the current sampling Y).  Note that I use an array
>> of
>>> edges and then a parallel array of pointers into the edges so that I
>>> can
>>> sort just the pointers and avoid moving tons of edge data around.
>>> Also,
>>> later when I eliminate edges from the active list I just move their
>>> pointers into and out of view rather than having to copy the edge
>>> data.
>>>   It is harder to do an array of pointers in Java, though - perhaps
>> an
>>> array of indices?  Here is some basic pseudocode:
>>>
>>> start with lo and hi pointing at index 0 in edge list.
>>> until edge list is exhausted {
>>>      process edges between lo and hi (empty on first pass)
>>>      scan from hi to lo and toss any edges that are exhausted
>>>          (compacting remaining pointers towards hi)
>>>      keep incrementing hi to accept any new edges coming into play
>>>      process edges between lo and hi to increment to the next Y
>>>          (note that new edges from previous step may not
>>>           need any processing in this stage if their starting
>>>           Y equals the next Y to be sampled)
>>> }
>>>
>>> Gradually lo and hi make their way through the list.  The edges
>> above
>>> hi
>>> are always sorted in order of when they may come into play as we
>> move
>>> downward in Y.  The edges between lo and hi are also usually kept
>>> sorted
>>> by their current X so that the stage that processes them into spans
>>> can
>>> just run through them.  The edges below lo are usually random
>> garbage
>>> because no care was taken during the "pruning" step to worry about
>>> what
>>> happens to the pointer table down there as lo is incremented (the
>>> algorithm only ever looks up the array of pointers).
>>>
>>> I hope that helps...
>>>
>>> 			...jim