[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
Jim Graham
james.graham at oracle.com
Fri May 10 23:13:11 UTC 2013
Hi Laurent,
On 5/9/13 11:50 PM, Laurent Bourgès wrote:
> Jim,
>
> I think that the ScanLineIterator class is no more useful and could be
> merged into Renderer directly: I try to optimize these 2 code paths
> (crossing / crossing -> alpha) but it seems quite difficult as I must
> understand hotspot optimizations (assembler code)...
I was originally skeptical about breaking that part of the process into
an iterator class as well.
> For now I want to keep pisces in Java code as hotspot is efficient
> enough and probably the algorithm can be reworked a bit;
> few questions:
> - should edges be sorted by Ymax ONCE to avoid complete edges traversal
> to count crossings for each Y value:
>
> 156 if ((bucketcount & 0x1) != 0) {
> 157 int newCount = 0;
> 158 for (int i = 0, ecur; i < count; i++) {
> 159 ecur = ptrs[i];
> * 160 if (_edgesInt[ecur + YMAX] > cury) {
> * 161 ptrs[newCount++] = ecur;
> 162 }
> 163 }
> 164 count = newCount;
> 165 }
This does not traverse all edges, just the edges currently "in play" and
it only does it for scanlines that had a recorded ymax on them (count is
multiplied by 2 and then optionally the last bit is set if some edge
ends on that scanline so we know whether or not to do the "remove
expired edges" processing).
> - why multiply x2 and divide /2 the crossings (+ rounding issues) ?
>
> 202 for (int i = 0, ecur, j; i < count; i++) {
> 203 ecur = ptrs[i];
> 204 curx = _edges[ecur /* + CURX */];
> 205 _edges[ecur /* + CURX */] = curx + _edges[ecur + SLOPE];
> 206
> * 207 cross = ((int) curx) << 1;
> * 208 if (_edgesInt[ecur + OR] != 0 /* > 0 */) {
> 209 cross |= 1;
The line above sets the bottom bit if the crossing is one orientation
vs. the other so we know whether to add one or subtract one to the
winding count. The crossings can then be sorted and the orientation
flag is carried along with the values as they are sorted. The cost of
this trick is having to shift the actual crossing coordinates by 1 to
vacate the LSBit.
> - last x pixel processing: could you explain me ?
> 712 int pix_xmax = x1 >> SUBPIXEL_LG_POSITIONS_X;
> 713 int tmp = (x0 & SUBPIXEL_MASK_X);
> 714 alpha[pix_x] += SUBPIXEL_POSITIONS_X - tmp;
> 715 alpha[pix_x + 1] += tmp;
> 716 tmp = (x1 & SUBPIXEL_MASK_X);
> 717 alpha[pix_xmax] -= SUBPIXEL_POSITIONS_X - tmp;
> 718 alpha[pix_xmax + 1] -= tmp;
Are you referring to the 2 += and the 2 -= for each end of the span? If
an edge crosses in a given pixel at 5 subpixel positions after the start
of that pixel, then it contributes a coverage of "SUBPIXEL_POS_X minus
5" in that pixel. But, starting with the following pixel, the total
coverage it adds for those pixels until it reaches the right edge of the
span is "SUBPIXEL_POSITIONS_X". However, we are recording deltas and
the previous pixels only bumped our total coverage by "S_P_X - 5". So,
we now need to bump the accumulated coverage by 5 in the following pixel
so that the total added coverage is "S_P_X".
Basically the pair of += lines adds a total S_P_X to the coverage, but
it splits that sum over two pixels - the one where the left edge first
appeared and its following pixel. Similarly, the two -= statements
subtract a total of S_P_X from the coverage total, and do so spread
across 2 pixels. If the crossing happened right at the left edge of the
pixel then tmp would be 0 and the second += or -= would be wasted, but
that only happens 1 out of S_P_X times and the cost of testing is
probably less than just adding tmp to the second pixel even if it is 0.
Also note that we need to have an array entry for alpha[max_x + 1] so
that the second += and -= don't store off the end of the array. We
don't need to use that value since we will stop our alpha accumulations
at the entry for max_x, but testing to see if the "second pixel delta
value" is needed is more expensive than just accumulating it into an
unused array entry.
> Finally, it seems that hotspot settings (CompileThreshold=1000 and
> -XX:aggressiveopts) are able to compile theses hotspots better ...
What about if we use the default settings as would most non-server apps?
> Thanks; probably the edgeBucket / edgeBucketCount arrays could be merged
> into a single one to improve cache affinity.
Interesting.
> FYI, I can write C/C++ code but I never practised JNI code.
> Does somebody could help us to port only these 2 hotspot methods ?
Port 2 Hotspot methods? I'm not sure what you are referring to here?
> PS: I attend a conference next week (germany) so I will be less
> available to work on code but I will read my emails.
Thanks for the heads up - as long as you don't time out and switch off
of this project permanently - that would be a shame... :(
...jim
More information about the 2d-dev
mailing list