[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Wed Jul 28 21:19:34 UTC 2010

Woohoo, Denis!  I look forward to seeing the new version!

		...jim

On 7/28/2010 5:51 AM, Denis Lila wrote:
> Hello Jim.
>
> This one performs almost identically to what is already there
> in openjdk6 and 7, since it's exactly what I sent for review
> last week, but with all the changes you suggested implemented.
>
> I would actually like to ask you to not look at either one of
> them. First of all, there is an ArrayIndexOutOfBoundsException
> possible in emitRow.
> And secondly, even if there wasn't, last night I implemented
> your algorithm from ShapeSpanIterator.c to iterate through
> the edges. I have yet to debug it, but it makes everything much,
> much simpler, and it should make it far faster, so we get the best
> of both worlds.
>
> Thanks,
> Denis.
>
> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
>
>> Hi Denis,
>>
>> I'll try to get through both versions and see if I can find anything
>> that was hurting performance with your EdgeLists.  I'm guessing that
>> this version was created because of the performance issues you found
>> with the EdgeList version?  Does this perform more closely to the
>> existing code than the EdgeList version?
>>
>> 			...jim
>>
>> Denis Lila wrote:
>>> Hello again.
>>>
>>> This attachmet is a "can of worms" implementation without all the
>> fancy (and slow)
>>> iteration. It also includes all of the other suggestions you sent in
>> your first
>>> review of Dasher and Renderer last week (most importantly, the
>> firstOrientation
>>> issue, horizontal lines filtering, and adding prefixes to variable
>> names to make
>>> it clear whether they refer to pixels, or subpixels).
>>>
>>> Regards,
>>> Denis.
>>>
>>> ----- "Denis Lila"<dlila at redhat.com>  wrote:
>>>
>>>> Hello Jim.
>>>>
>>>> I implemented your "can of worms" idea. It works, and it got rid
>> of
>>>> the biasing.
>>>> I wasn't able to send a webrev, but there are many changes and a
>> side
>>>> by side
>>>> comparison would probably be useless, so I just attached the file.
>> I
>>>> hope this is
>>>> ok.
>>>>
>>>> I also implemented a "better" iterating structure for the lines
>> and
>>>> the
>>>> strips and crossings. I think it is better in every way, except
>>>> performance.
>>>> The new file is more than 200 lines smaller than the old one. The
>> only
>>>> members of Renderer are now the AA variables and the position
>>>> variables
>>>> (sx*, sy*, x*, y*).
>>>> What I've done is I added an EdgeList class, which encapsulates
>> all
>>>> the edge
>>>> related variables in the old Renderer. At first, I had an Edge
>> class
>>>> in addition
>>>> to the EdgeList class, and while this was much nicer, it turned out
>> to
>>>> be too
>>>> expensive (see last paragraph).
>>>> I've also added a ScanLineIterator, so instead of _endRendering
>>>> iterating
>>>> through strips, and then calling renderStrip() which iterates
>> through
>>>> the
>>>> scanlines in that strip, and then through the crossings in that
>>>> scanline,
>>>> what happens now is that _endRendering uses the Iterator<ScanLine>
>> to
>>>> iterate through each scanline, get get its crossings and iterate
>>>> through them
>>>> to accumulate the alpha. By the way, a ScanLine is a type defined
>> by
>>>> an
>>>> interface which exports methods for getting the y coord of the
>> line,
>>>> the
>>>> number of crossings in it, the ith crossing, and a method for
>> sorting
>>>> its crossings.
>>>> The class that implements ScanLine is ScanLineIterator itself. I
>> made
>>>> a
>>>> ScanLine class, but I was afraid performance would suffer because
>> of
>>>> all the
>>>> object creations (this turned out not to be an issue, after I
>> switched
>>>> to the
>>>> current way, and remeasured things). I did not switch back because
>>>> this is
>>>> only slightly worse.
>>>>
>>>> As for performance: I wrote a simple program that tries to draw a
>>>> dashed path
>>>> that consists of about 160 dashed lines of width 1 and length
>> 30000,
>>>> going
>>>> from the centre of the frame to some point. On my machine, this
>> takes
>>>> about 4.9
>>>> seconds in openjdk6, and 26 seconds using the attached file. Back
>> when
>>>> I was using
>>>> the Edge class it took about 39 seconds. Everything without hundres
>> of
>>>> thousands of
>>>> edges is not much slower
>>>> I have not changed any of the algorithms. ScanLineIterator still
>> goes
>>>> through
>>>> strips of the same size and computes crossings in every strip
>> using
>>>> the same
>>>> method as before, so I don't know why it's so slow. It can't be
>>>> because of anything
>>>> happening in _endRendering, because there are only about 9000
>>>> scanlines and for each
>>>> of them I've just added a few calls to one line getters (which used
>> to
>>>> be direct
>>>> accesses into arrays).
>>>>
>>>> Thanks,
>>>> Denis.
>>>>
>>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
>>>>
>>>>> Denis Lila wrote:
>>>>>> Hello Jim.
>>>>>>
>>>>>> Thank you very much for taking the time to read through this.
>>>>>>
>>>>>>> 169 - if origLen reaches the end of the dash exactly (the "=="
>>>>> case)
>>>>>>      You're right, I should. I can't just replace<= with ==
>>>> though,
>>>>>> because the results will be the same: in the equal case origLen
>>>>> will
>>>>>> become 0, and on the next iteration, the (origLen<
>>>> dash[idx]-phase)
>>>>> will
>>>>>> be true, and we will do a goTo(x1,y1), which is what we just did
>>>> in
>>>>> the
>>>>>> previous iteration (unless dash[idx] is 0, in which case the
>>>> results
>>>>>> will be even worse). The best solution to this is to just do a
>>>>> nested
>>>>>> check for the == case.
>>>>> Ah, right - because there is no "break" when origLen becomes
>> zero.
>>>>> Sounds like you're on it.
>>>>>
>>>>>>> 171 - Aren't x0,y0 stored as subpix values?  You would then be
>>>>> comparing
>>>>>>> a subpix value to a non-subpix value.  Perhaps if the subpix
>>>> calls
>>>>> are
>>>>>>> moved to the top of the function I think this should work OK?
>>>>>>      That's true, they are. This is very puzzling. If a
>> horizontal
>>>>> line is
>>>>>> added, when the crossings for it are being computed, dxBydy
>> should
>>>>> be NaN, and
>>>>>> wouldn't an error be thrown when we try to cast to an int in the
>>>>> call to addCrossing?
>>>>>
>>>>> I'm not sure - I didn't trace it through very far - I just noted
>>>> that
>>>>> the values were likely in different "resolutions".
>>>>>
>>>>>>> 194,197 - Shouldn't these be constants, or based on the
>>>>> SUB_POS_XY?
>>>>>>      I suppose I should make a biasing constant. I don't think
>> they
>>>>> should be based
>>>>>> on SUB_POS_XY though, because the biasing is done to subpixel
>>>>> coordinates so
>>>>>> there is no danger that if our supersampling is fine enough the
>>>>> biasing will
>>>>>> make the coordinates jump over some scan line.
>>>>> I'm guessing you punted on my "can of worms" suggestion then.
>> ;-)
>>>>>
>>>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then you
>>>> don't
>>>>> have
>>>>>>> to "unsubpix" them here.  (Though you still need the subpix sy0
>>>> for
>>>>> line
>>>>>>> 209 - or you could just call subpix on it for that line and at
>>>>> least
>>>>>>> you'd save 1 call to sub/unsub).
>>>>>>      Ok. I'll just save subpixed and unsubpixed versions of sx0,
>>>> sy0.
>>>>> That should
>>>>>> eliminate all sx0,sy0 related calls to tosubpix and topix except
>>>> in
>>>>> moveTo.
>>>>>
>>>>> and lineTo.  You may only need 3 of those values, though, if I
>>>>> remember
>>>>> my code reading well enough.
>>>>>
>>>>>>> 256,264 - casting to int is problematic.  It truncates towards
>> 0
>>>>> which
>>>>>>> means negatives are ceil'd and positives are floor'd.  It would
>>>> be
>>>>> best
>>>>>>> to use floor here instead.  On the other hand, if negative
>>>> numbers
>>>>> are
>>>>>>> always "off the left side of the drawable" then this is moot.
>>>>>>      That's why I left it at int casting. Do you still think I
>>>> should
>>>>> change it
>>>>>> to floor?
>>>>> If you mean floor, I think it best to use floor.  Unless you can
>>>> prove
>>>>> that negatives aren't really an issue and that the strange
>>>> truncation
>>>>> on
>>>>> them won't be numerically a problem - but I don't think it is
>> worth
>>>> it
>>>>> for this code.
>>>>>
>>>>>> Speaking of which, is there a good way to edit and build openJDK
>>>>> from eclipse?
>>>>>> Then this sort of embarrassing error could be avoided (including
>>>> the
>>>>> printStats() call).
>>>>>
>>>>> I don't use Eclipse, sorry.  :-(
>>>>>
>>>>>>      As for Arrays.newSize()... I can't find it here:
>>>>>>
>>>>
>> http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html
>>>>>> Is this a new function added in 7?
>>>>> Sorry, make that Arrays.copyOf(..., newSize).  I tried to type
>> the
>>>>> name
>>>>> from memory and got it wrong.
>>>>>
>>>>>>> 721 - Arrays.sort()
>>>>>>      I thought about using this, but I did some measurements, and
>>>> it
>>>>> turns out that
>>>>>> Arrays.sort() is a bit slower if the portion of the array being
>>>>> sorted has fewer
>>>>>> than about 70 elements.
>>>>> I wonder what the typical number of elements is.  Is this sorting
>>>>> crossings per line?  Then simple primitives like circles should
>> only
>>>>> have 2 per line, right?  Is it worth testing for really small
>>>> numbers
>>>>> of
>>>>> elements (much lower than 70) and doing a manual sort?  Or am I
>>>>> misunderstanding what is being sorted there?
>>>>>
>>>>>>> How comfortable do you feel with that conversion?
>>>>>> I'll try to do it and include it in a new patch along with,
>>>>> hopefully, a better way
>>>>>> to iterate through strips, and especially crossings. Right now
>> all
>>>>> the iteration
>>>>>> state is saved in global variables. This is... not good. I spent
>>>> far
>>>>> too much time last
>>>>>> week on bugs caused by this sort of thing. Ideally, any members
>>>> that
>>>>> can't be put at
>>>>>> the top of a class (like our edge and crossing data) should be
>> put
>>>>> in classes of their own.
>>>>>
>>>>> That sounds good, but also consider doing it in separate stages
>> to
>>>>> reduce churn in the code reviewing (and you then have revs to go
>>>> back
>>>>> and test which stage caused a probem if we find a bug later):
>>>>>
>>>>> - first get all on floats
>>>>> - then change strip management
>>>>> - then change to open coordinate intervals
>>>>> - (or vice versa)
>>>>>
>>>>>> Do you have any ideas about how to iterate through edges in a
>>>> strip
>>>>> without going through
>>>>>> every edge? I was thinking of maybe using some sort of tree to
>>>> split
>>>>> the drawing surface,
>>>>>> but I haven't given it much thought.
>>>>> If you look for something like the native code for
>>>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I
>> typically
>>>>> like to do edge setup and enumeration.
>>>>>
>>>>> That code uses the "half open interval" approach for both X and Y
>>>>> intervals.
>>>>>
>>>>> I then sort the edge list by "leading Y" and then move through
>> the
>>>>> edges
>>>>> using the following manner (kind of like an inch worm eating the
>>>>> edges,
>>>>> maintaining a pointer to the beginning and end of an "active
>> list"
>>>> of
>>>>> edges that are in play at any given time by virtue of having their
>> Y
>>>>> range intersect the current sampling Y).  Note that I use an
>> array
>>>> of
>>>>> edges and then a parallel array of pointers into the edges so that
>> I
>>>>> can
>>>>> sort just the pointers and avoid moving tons of edge data around.
>>>>> Also,
>>>>> later when I eliminate edges from the active list I just move
>> their
>>>>> pointers into and out of view rather than having to copy the edge
>>>>> data.
>>>>>    It is harder to do an array of pointers in Java, though -
>> perhaps
>>>> an
>>>>> array of indices?  Here is some basic pseudocode:
>>>>>
>>>>> start with lo and hi pointing at index 0 in edge list.
>>>>> until edge list is exhausted {
>>>>>       process edges between lo and hi (empty on first pass)
>>>>>       scan from hi to lo and toss any edges that are exhausted
>>>>>           (compacting remaining pointers towards hi)
>>>>>       keep incrementing hi to accept any new edges coming into
>> play
>>>>>       process edges between lo and hi to increment to the next Y
>>>>>           (note that new edges from previous step may not
>>>>>            need any processing in this stage if their starting
>>>>>            Y equals the next Y to be sampled)
>>>>> }
>>>>>
>>>>> Gradually lo and hi make their way through the list.  The edges
>>>> above
>>>>> hi
>>>>> are always sorted in order of when they may come into play as we
>>>> move
>>>>> downward in Y.  The edges between lo and hi are also usually kept
>>>>> sorted
>>>>> by their current X so that the stage that processes them into
>> spans
>>>>> can
>>>>> just run through them.  The edges below lo are usually random
>>>> garbage
>>>>> because no care was taken during the "pruning" step to worry
>> about
>>>>> what
>>>>> happens to the pointer table down there as lo is incremented (the
>>>>> algorithm only ever looks up the array of pointers).
>>>>>
>>>>> I hope that helps...
>>>>>
>>>>> 			...jim