[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

Laurent Bourgès bourges.laurent at gmail.com
Fri May 10 06:50:17 UTC 2013


Jim,

FYI, I am working on optimizing the 2 hotspot methods annotated by oprofile
(see specific emails) :
- ScanLineIterator.next() ~ 35%
- Renderer.endRendering(...) ~ 20%

I think that the ScanLineIterator class is no more useful and could be
merged into Renderer directly: I try to optimize these 2 code paths
(crossing / crossing -> alpha) but it seems quite difficult as I must
understand hotspot optimizations (assembler code)...

For now I want to keep pisces in Java code as hotspot is efficient enough
and probably the algorithm can be reworked a bit;
few questions:
- should edges be sorted by Ymax ONCE to avoid complete edges traversal to
count crossings for each Y value:

 156             if ((bucketcount & 0x1) != 0) {
 157                 int newCount = 0;
 158                 for (int i = 0, ecur; i < count; i++) {
 159                     ecur = ptrs[i];* 160                     if
(_edgesInt[ecur + YMAX] > cury) {* 161
ptrs[newCount++] = ecur;
 162                     }
 163                 }
 164                 count = newCount;
 165             }

- why multiply x2 and divide /2 the crossings (+ rounding issues) ?

 202             for (int i = 0, ecur, j; i < count; i++) {
 203                 ecur = ptrs[i];
 204                 curx = _edges[ecur /* + CURX */];
 205                 _edges[ecur /* + CURX */] = curx + _edges[ecur + SLOPE];
 206                 * 207                 cross = ((int) curx) << 1;*
208                 if (_edgesInt[ecur + OR] != 0 /* > 0 */) {
 209                     cross |= 1;
 210                 }

* 674                 int lowx = crossings[0] >> 1;
 675                 int highx = crossings[numCrossings - 1] >> 1;*
 689                 for (int i = 0; i < numCrossings; i++) {
 690                     int curxo = crossings[i];* 691
     int curx = curxo >> 1;*
- last x pixel processing: could you explain me ?
 712                                 int pix_xmax = x1 >>
SUBPIXEL_LG_POSITIONS_X;
 713                                 int tmp = (x0 & SUBPIXEL_MASK_X);
 714                                 alpha[pix_x] += SUBPIXEL_POSITIONS_X - tmp;
 715                                 alpha[pix_x + 1] += tmp;
 716                                 tmp = (x1 & SUBPIXEL_MASK_X);
 717                                 alpha[pix_xmax] -=
SUBPIXEL_POSITIONS_X - tmp;
 718                                 alpha[pix_xmax + 1] -= tmp;

Finally, it seems that hotspot settings (CompileThreshold=1000 and
-XX:aggressiveopts) are able to compile theses hotspots better ...


2013/5/8 Jim Graham <james.graham at oracle.com>

> This is amazing work, Laurent!  I'll look over the code changes soon. Note
> that the "2 edge arrays" issue goes away if we can use native methods and C
> structs.  It may be faster still in that case...


Thanks; probably the edgeBucket / edgeBucketCount arrays could be merged
into a single one to improve cache affinity.

Let stay in java ... as hotspot is so efficient (until the contrary is
proven).

FYI, I can write C/C++ code but I never practised JNI code.
Does somebody could help us to port only these 2 hotspot methods ?

PS: I attend a conference next week (germany) so I will be less available
to work on code but I will read my emails.

Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20130510/f41be939/attachment.html>


More information about the 2d-dev mailing list