[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Tue Oct 26 00:54:31 UTC 2010

Hi Denis,

Just to be certain - you are still planning on putting the existing 
stuff back and we're talking about future work, right?  I'd love to get 
a stake in the ground here.

On 10/25/2010 3:30 PM, Denis Lila wrote:
>> - Create a curve class and store an array of those so you don't have
>> to iterate 3 different arrays of values and use array accesses to grab
>> the data (each array access is checked for index OOB exceptions).
>
> I actually implemented something like this in my first draft of the current
> version of renderer. I didn't stick with it because it was a slower than
> even what we have in openjdk6. Granted, my implementation was a bit more
> high level than simply making 3 classes to represent lines quads and cubics,
> but it seemed pretty hopeless, so I didn't spend any time figuring out
> exactly what it was that made it slower.

Hmmm...  Perhaps object allocation overhead was biting us there.  In 
native code you could cobble this up with batch allocation of space and 
pseudo-object/struct allocation out of the batches.

>> - Or perform AFD on curves as they come into Renderer, but only save
>> line segment edges in the edges array.  This would use more storage,
>> but the iterations of the AFD would happen in a tight loop as the data
>> comes in rather than having to store all of their AFD data back in the quads
>
> I considered this before abandoning the high level version I mention above.
> I didn't go with it because, while I am not worried about the higher memory
> consumption, I was afraid of the impact that having this many edges would
> have on the qsort call and on lines 99-113 and 140-148 in next().

I can see worrying about qsort, but I think one qsort would be 
inherently faster than 3 qsorts which you have anyway so the difference 
would get lost in the noise.  Also, I'm not sure how the 99 to 113 and 
140 to 148 would be affected.  The path will have the same number of 
crossings per sample row regardless of whether the curves have been 
flattened or not.  You might be adding and deleting edges from the 
active list more often, though (in other words, 99 would dump more 
curves and 140 would take in more curves), but the number of edges or 
curves at any given point would not be affected by flattening.  Also, 
the way you've written the loops at 99, they have to copy every 
edge/quad/curve that *doesn't* expire so a skipped curve is actually 
less work for that loop.  The loops at 140 would have to occasionally do 
more processing, but it is made up for in the fact that 99 does less 
work and the "nextY" processing is simpler.

> How about this: we change the format of the edges array to be an array of
> sequences of edges. So, right now the edges array has this format:
> E* where E represents 6 consecutive floats {ymin,ymax,curx,cury,or,slope}.
> I am proposing we change it to {n,or}({ymin,ymax,curx,cury,slope})^n.
> lineTo's would add an edge sequence with n=1 to the edges array. If a call
> to quadTo or curveTo produced N curves, it would simply put N,or at the
> end of the edges array, and then append the data for the N produced edges.
> I think this would give us the best of both worlds, in that we can do all
> the AFD iterations in a tight loop close to the input methods and it
> doesn't present any problems with respect to sorting or managing the
> active list. It can probably be implemented completely transparently with
> respect to ScanlineIterator. The details of
> the implementation involve an interesting speed/memory trade off, but we
> can discuss that later.

I think this might be overkill since sorts tend to have logN behavior so 
doubling the number of edges would not double the time taken in the 
sort.  Also, I would think that the sort would be a small amount of time 
compared to the rest of the processing, wasn't it?

If we are really worried about the y-sort, then how about creating a 
bunch of buckets and doing a bucket sort of the edges?  As they are 
added to the list of segments, we accumulate their indices in a row list 
based on their startY so that each step of the "next()" simply moves to 
the next Y and adds the edges mentioned in the list there.  Some work 
would have to be done on how to manage the storage simply (like a 
"rownext" field in the edge structure so that they are linked in a 
linked list), but then there would be no qsort at all for the cost of 
new int[N_ROWS] and an extra field in every edge.

>> - Convert to native.  Note that we use a native version of the pisces
>> that you started with to do some rendering in FX.  I tried porting to
>> use your new (Java) renderer in FX and performance went down even
>> though you show it to be faster than what was there before.  So, your Java
>> renderer compares favorably to the old Java pisces, but both compare
>> unfavorably to the old native pisces.  Maybe we should convert your
>> code to native and see if that gives us a performance boost.  It's nice to
>> use pure Java, but there is a lot of "shoehorning" of data going on
>> here that could be done much more easily and naturally in native code.
>
> I've been wanting to do this for a very long time. C and C++ are more
> convenient for this type of work. I didn't because I've never used the JNI
> before. I guess this is as good a time to learn as any. I'd still like to
> keep the debugging of native code to a minimum, so we should implement
> as much as possible in java before starting on this.

Perhaps we should work on the algorithms a little more then (I'm talking 
about the numeric stuff, not the memory management stuff since the 
memory management techniques will differ quite a lot in C code, but 
better math helps at either level)?

>> How is that for "food for thought"?
> Delicious :)

<yummy-happy-face>

Here's another idea:

90% (guesstimate) of the time edges do not cross each other, thus if you 
sort the crossings without reordering the active edges then you just end 
up doing the same sorting work (same swaps) on the next scanline.  My 
SpanShapeIterator code actually reordered the edges on every sample line 
to match their current X coordinates in a way that involved 1 compare 
per edge that was processed and only occasionally a swap of 2 edge 
pointers.  It would basically eliminate the sort at line 149 at the cost 
of only doing a compare in the nextY processing for the very very common 
case.

And another non-idea:

I tried inlining all of the work that lineTo did and it ended up slower 
than what you do where you turn it into a generic curve of length 4 in a 
points array.  I scratched my head, but didn't pursue why that would be 
slower.  It might have been because I had a large chunk of nearly 
duplicate code in both halves of an "if (y0 < y1)" conditional rather 
than simply swapping the coordinates and using common code - maybe it 
was an instruction cache thing.

The test case I was using to test performance was only using polygons so 
the curve stuff didn't even come into play...

			...jim