[OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Denis Lila
dlila at redhat.com
Tue Oct 26 13:58:21 UTC 2010
Hi Jim.
> Just to be certain - you are still planning on putting the existing
> stuff back and we're talking about future work, right? I'd love to
> get a stake in the ground here.
Yes, I'll push today.
> If we are really worried about the y-sort, then how about creating a
> bunch of buckets and doing a bucket sort of the edges? As they are
> added to the list of segments, we accumulate their indices in a row
> list based on their startY so that each step of the "next()" simply moves
> to the next Y and adds the edges mentioned in the list there. Some work
> would have to be done on how to manage the storage simply (like a
> "rownext" field in the edge structure so that they are linked in a
> linked list), but then there would be no qsort at all for the cost of
> new int[N_ROWS] and an extra field in every edge.
I like this.
> Perhaps we should work on the algorithms a little more then (I'm
> talking about the numeric stuff, not the memory management stuff since the
> memory management techniques will differ quite a lot in C code, but
> better math helps at either level)?
Indeed - especially the dashing.
> 90% (guesstimate) of the time edges do not cross each other, thus if
> you sort the crossings without reordering the active edges then you just
> end up doing the same sorting work (same swaps) on the next scanline. My
> SpanShapeIterator code actually reordered the edges on every sample
> line to match their current X coordinates in a way that involved 1 compare
> per edge that was processed and only occasionally a swap of 2 edge
> pointers. It would basically eliminate the sort at line 149 at the
> cost of only doing a compare in the nextY processing for the very very
> common case.
I also implemented this some time ago. I didn't keep it because it slowed
things down a bit. However, I only tested it with very complex and large
paths, and in hindsight, I shouldn't have based my decision on that, so I
will re-implement it.
Regards,
Denis.
----- "Jim Graham" <james.graham at oracle.com> wrote:
> Hi Denis,
>
> On 10/25/2010 3:30 PM, Denis Lila wrote:
> >> - Create a curve class and store an array of those so you don't
> have
> >> to iterate 3 different arrays of values and use array accesses to
> grab
> >> the data (each array access is checked for index OOB exceptions).
> >
> > I actually implemented something like this in my first draft of the
> current
> > version of renderer. I didn't stick with it because it was a slower
> than
> > even what we have in openjdk6. Granted, my implementation was a bit
> more
> > high level than simply making 3 classes to represent lines quads and
> cubics,
> > but it seemed pretty hopeless, so I didn't spend any time figuring
> out
> > exactly what it was that made it slower.
>
> Hmmm... Perhaps object allocation overhead was biting us there. In
> native code you could cobble this up with batch allocation of space
> and
> pseudo-object/struct allocation out of the batches.
>
> >> - Or perform AFD on curves as they come into Renderer, but only
> save
> >> line segment edges in the edges array. This would use more
> storage,
> >> but the iterations of the AFD would happen in a tight loop as the
> data
> >> comes in rather than having to store all of their AFD data back in
> the quads
> >
> > I considered this before abandoning the high level version I mention
> above.
> > I didn't go with it because, while I am not worried about the higher
> memory
> > consumption, I was afraid of the impact that having this many edges
> would
> > have on the qsort call and on lines 99-113 and 140-148 in next().
>
> I can see worrying about qsort, but I think one qsort would be
> inherently faster than 3 qsorts which you have anyway so the
> difference
> would get lost in the noise. Also, I'm not sure how the 99 to 113 and
>
> 140 to 148 would be affected. The path will have the same number of
> crossings per sample row regardless of whether the curves have been
> flattened or not. You might be adding and deleting edges from the
> active list more often, though (in other words, 99 would dump more
> curves and 140 would take in more curves), but the number of edges or
>
> curves at any given point would not be affected by flattening. Also,
>
> the way you've written the loops at 99, they have to copy every
> edge/quad/curve that *doesn't* expire so a skipped curve is actually
> less work for that loop. The loops at 140 would have to occasionally
> do
> more processing, but it is made up for in the fact that 99 does less
> work and the "nextY" processing is simpler.
>
> > How about this: we change the format of the edges array to be an
> array of
> > sequences of edges. So, right now the edges array has this format:
> > E* where E represents 6 consecutive floats
> {ymin,ymax,curx,cury,or,slope}.
> > I am proposing we change it to
> {n,or}({ymin,ymax,curx,cury,slope})^n.
> > lineTo's would add an edge sequence with n=1 to the edges array. If
> a call
> > to quadTo or curveTo produced N curves, it would simply put N,or at
> the
> > end of the edges array, and then append the data for the N produced
> edges.
> > I think this would give us the best of both worlds, in that we can
> do all
> > the AFD iterations in a tight loop close to the input methods and
> it
> > doesn't present any problems with respect to sorting or managing
> the
> > active list. It can probably be implemented completely transparently
> with
> > respect to ScanlineIterator. The details of
> > the implementation involve an interesting speed/memory trade off,
> but we
> > can discuss that later.
>
> I think this might be overkill since sorts tend to have logN behavior
> so
> doubling the number of edges would not double the time taken in the
> sort. Also, I would think that the sort would be a small amount of
> time
> compared to the rest of the processing, wasn't it?
>
> >> - Convert to native. Note that we use a native version of the
> pisces
> >> that you started with to do some rendering in FX. I tried porting
> to
> >> use your new (Java) renderer in FX and performance went down even
> >> though you show it to be faster than what was there before. So,
> your Java
> >> renderer compares favorably to the old Java pisces, but both
> compare
> >> unfavorably to the old native pisces. Maybe we should convert
> your
> >> code to native and see if that gives us a performance boost. It's
> nice to
> >> use pure Java, but there is a lot of "shoehorning" of data going
> on
> >> here that could be done much more easily and naturally in native
> code.
> >
> > I've been wanting to do this for a very long time. C and C++ are
> more
> > convenient for this type of work. I didn't because I've never used
> the JNI
> > before. I guess this is as good a time to learn as any. I'd still
> like to
> > keep the debugging of native code to a minimum, so we should
> implement
> > as much as possible in java before starting on this.
>
> >> How is that for "food for thought"?
> > Delicious :)
>
> <yummy-happy-face>
>
> Here's another idea:
>
> And another non-idea:
>
> I tried inlining all of the work that lineTo did and it ended up
> slower
> than what you do where you turn it into a generic curve of length 4 in
> a
> points array. I scratched my head, but didn't pursue why that would
> be
> slower. It might have been because I had a large chunk of nearly
> duplicate code in both halves of an "if (y0 < y1)" conditional rather
>
> than simply swapping the coordinates and using common code - maybe it
>
> was an instruction cache thing.
>
> The test case I was using to test performance was only using polygons
> so
> the curve stuff didn't even come into play...
>
> ...jim
More information about the 2d-dev
mailing list