[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

Tue Oct 26 13:58:21 UTC 2010

Hi Jim.

> Just to be certain - you are still planning on putting the existing 
> stuff back and we're talking about future work, right?  I'd love to
> get a stake in the ground here.

Yes, I'll push today.

> If we are really worried about the y-sort, then how about creating a 
> bunch of buckets and doing a bucket sort of the edges?  As they are 
> added to the list of segments, we accumulate their indices in a row
> list based on their startY so that each step of the "next()" simply moves
> to the next Y and adds the edges mentioned in the list there.  Some work
> would have to be done on how to manage the storage simply (like a 
> "rownext" field in the edge structure so that they are linked in a 
> linked list), but then there would be no qsort at all for the cost of
> new int[N_ROWS] and an extra field in every edge.

I like this.

> Perhaps we should work on the algorithms a little more then (I'm
> talking about the numeric stuff, not the memory management stuff since the 
> memory management techniques will differ quite a lot in C code, but 
> better math helps at either level)?

Indeed - especially the dashing.

> 90% (guesstimate) of the time edges do not cross each other, thus if
> you sort the crossings without reordering the active edges then you just
> end up doing the same sorting work (same swaps) on the next scanline.  My
> SpanShapeIterator code actually reordered the edges on every sample
> line to match their current X coordinates in a way that involved 1 compare
> per edge that was processed and only occasionally a swap of 2 edge 
> pointers.  It would basically eliminate the sort at line 149 at the
> cost of only doing a compare in the nextY processing for the very very
> common case.

I also implemented this some time ago. I didn't keep it because it slowed
things down a bit. However, I only tested it with very complex and large
paths, and in hindsight, I shouldn't have based my decision on that, so I
will re-implement it.

Regards,
Denis.

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Hi Denis,
> 
> On 10/25/2010 3:30 PM, Denis Lila wrote:
> >> - Create a curve class and store an array of those so you don't
> have
> >> to iterate 3 different arrays of values and use array accesses to
> grab
> >> the data (each array access is checked for index OOB exceptions).
> >
> > I actually implemented something like this in my first draft of the
> current
> > version of renderer. I didn't stick with it because it was a slower
> than
> > even what we have in openjdk6. Granted, my implementation was a bit
> more
> > high level than simply making 3 classes to represent lines quads and
> cubics,
> > but it seemed pretty hopeless, so I didn't spend any time figuring
> out
> > exactly what it was that made it slower.
> 
> Hmmm...  Perhaps object allocation overhead was biting us there.  In 
> native code you could cobble this up with batch allocation of space
> and 
> pseudo-object/struct allocation out of the batches.
> 
> >> - Or perform AFD on curves as they come into Renderer, but only
> save
> >> line segment edges in the edges array.  This would use more
> storage,
> >> but the iterations of the AFD would happen in a tight loop as the
> data
> >> comes in rather than having to store all of their AFD data back in
> the quads
> >
> > I considered this before abandoning the high level version I mention
> above.
> > I didn't go with it because, while I am not worried about the higher
> memory
> > consumption, I was afraid of the impact that having this many edges
> would
> > have on the qsort call and on lines 99-113 and 140-148 in next().
> 
> I can see worrying about qsort, but I think one qsort would be 
> inherently faster than 3 qsorts which you have anyway so the
> difference 
> would get lost in the noise.  Also, I'm not sure how the 99 to 113 and
> 
> 140 to 148 would be affected.  The path will have the same number of 
> crossings per sample row regardless of whether the curves have been 
> flattened or not.  You might be adding and deleting edges from the 
> active list more often, though (in other words, 99 would dump more 
> curves and 140 would take in more curves), but the number of edges or
> 
> curves at any given point would not be affected by flattening.  Also,
> 
> the way you've written the loops at 99, they have to copy every 
> edge/quad/curve that *doesn't* expire so a skipped curve is actually 
> less work for that loop.  The loops at 140 would have to occasionally
> do 
> more processing, but it is made up for in the fact that 99 does less 
> work and the "nextY" processing is simpler.
> 
> > How about this: we change the format of the edges array to be an
> array of
> > sequences of edges. So, right now the edges array has this format:
> > E* where E represents 6 consecutive floats
> {ymin,ymax,curx,cury,or,slope}.
> > I am proposing we change it to
> {n,or}({ymin,ymax,curx,cury,slope})^n.
> > lineTo's would add an edge sequence with n=1 to the edges array. If
> a call
> > to quadTo or curveTo produced N curves, it would simply put N,or at
> the
> > end of the edges array, and then append the data for the N produced
> edges.
> > I think this would give us the best of both worlds, in that we can
> do all
> > the AFD iterations in a tight loop close to the input methods and
> it
> > doesn't present any problems with respect to sorting or managing
> the
> > active list. It can probably be implemented completely transparently
> with
> > respect to ScanlineIterator. The details of
> > the implementation involve an interesting speed/memory trade off,
> but we
> > can discuss that later.
> 
> I think this might be overkill since sorts tend to have logN behavior
> so 
> doubling the number of edges would not double the time taken in the 
> sort.  Also, I would think that the sort would be a small amount of
> time 
> compared to the rest of the processing, wasn't it?
> 
> >> - Convert to native.  Note that we use a native version of the
> pisces
> >> that you started with to do some rendering in FX.  I tried porting
> to
> >> use your new (Java) renderer in FX and performance went down even
> >> though you show it to be faster than what was there before.  So,
> your Java
> >> renderer compares favorably to the old Java pisces, but both
> compare
> >> unfavorably to the old native pisces.  Maybe we should convert
> your
> >> code to native and see if that gives us a performance boost.  It's
> nice to
> >> use pure Java, but there is a lot of "shoehorning" of data going
> on
> >> here that could be done much more easily and naturally in native
> code.
> >
> > I've been wanting to do this for a very long time. C and C++ are
> more
> > convenient for this type of work. I didn't because I've never used
> the JNI
> > before. I guess this is as good a time to learn as any. I'd still
> like to
> > keep the debugging of native code to a minimum, so we should
> implement
> > as much as possible in java before starting on this.
> 
> >> How is that for "food for thought"?
> > Delicious :)
> 
> <yummy-happy-face>
> 
> Here's another idea:
> 
> And another non-idea:
> 
> I tried inlining all of the work that lineTo did and it ended up
> slower 
> than what you do where you turn it into a generic curve of length 4 in
> a 
> points array.  I scratched my head, but didn't pursue why that would
> be 
> slower.  It might have been because I had a large chunk of nearly 
> duplicate code in both halves of an "if (y0 < y1)" conditional rather
> 
> than simply swapping the coordinates and using common code - maybe it
> 
> was an instruction cache thing.
> 
> The test case I was using to test performance was only using polygons
> so 
> the curve stuff didn't even come into play...
> 
>                         ...jim