[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Mon Jul 6 10:28:16 UTC 2015

Jim,

I have made the mentioned tests: it means I modified addLine() and
endRendering methods:

1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces
(FX) :

Renderer.USE_CORRECT_RND=true

The output images are different from Pisces ones but are now closer to
Ductus ones = more accurate.

Of course, it is slower up to 15% on the very complex map:

REF:

dc_boulder_2013-13-30-06-13-17.ser               1    93    112.996
113.297    112.805    0.507    111.459    113.508    93

dc_shp_alllayers_2013-00-30-07-00-43.ser         1    246    42.791
43.483    42.926    0.283    42.648    43.816    246

dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    762.882
764.781    763.136    0.715    762.219    765.110    25

test_z_625k.ser                                  1    61    168.745
169.238    168.780    0.216    168.423    169.417    61

PROPER_ROUND:

dc_boulder_2013-13-30-06-13-17.ser               1    90    115.722
116.187    115.756    0.196    115.497    116.691    90

dc_shp_alllayers_2013-00-30-07-00-43.ser         1    230    45.598
45.816    45.620    0.105    45.467    46.182    230

dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    877.221
878.020    877.272    0.558    876.277    878.890    25

test_z_625k.ser                                  1    60    173.377
173.729    173.411    0.191    173.108    174.188    60


2/ use fixed point approach (longer work) to only use integer maths in
Marlin rendering loop (crossings):

Renderer.USE_CORRECT_RND=true and Renderer.USE_FP=true

I simply made a port of ShapeSpanIterator.c (bumpx, bumperr, error) as you
can see below in the given patch.

It works well and the output images are close to ductus too (hope to be
equals to previous test).

=> faster (no float to int conversions ?)

It is faster than previous test (float + proper round) but not faster yet
than current Marlin (float + cast): ~ 5% slower max.

USE_FP:

dc_boulder_2013-13-30-06-13-17.ser               1    89    117.544
117.900    117.564    0.173    117.306    118.287    89

dc_shp_alllayers_2013-00-30-07-00-43.ser         1    231    45.338
45.502    45.347    0.126    45.155    46.458    231

dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    808.432
809.665    808.602    0.723    807.456    810.553    25

test_z_625k.ser                                  1    61    170.808
171.272    170.886    0.231    170.566    171.789    61

However, the performance gap is very small and it can be further optimized:
remove Unsafe usage that is no more required:

=> edge array will then only contain int[] and Unsafe usage is no more
necessary

To conclude, these tests improved the output quality (better rounding) and
the fixed-point approach  is promising: it is quite fast and allows to get
rid of Unsafe usage => simpler / safe and edge array will use again array
caches (like others).

I will try going further during the week ...

Laurent


PS: Here is a (quick and dirty) patch on Renderer to *illustrate* my
changes and let you see what I did:

# This patch file was generated by NetBeans IDE
# It uses platform neutral UTF-8 encoding and \n newlines.

--- HEAD
+++ Modified In Working Tree
@@ -37,6 +37,23 @@
 import sun.misc.Unsafe;

 final class Renderer implements PathConsumer2D, MarlinConst {
+
+    final static boolean USE_CORRECT_RND = true;
+
+    final static boolean USE_FP = true && USE_CORRECT_RND;
+
+    /*
+#define ERRSTEP_MAX     (0x7fffffff)
+#define FRACTTOJINT(f)  ((jint) ((f) * (double) ERRSTEP_MAX))
+    */
+    final static int ERR_STEP_MAX = 0x7fffffff;
+    final static double ERR_STEP_MAX_DBL = (double)ERR_STEP_MAX;
+
+    static int fractToInt(final float f) {
+        return (int) (f * ERR_STEP_MAX_DBL);
+    }
+
+
     // unsafe reference
     final static Unsafe unsafe;
     // array offset
@@ -102,9 +119,6 @@
     static final int INITIAL_BUCKET_ARRAY
         = INITIAL_PIXEL_DIM * SUBPIXEL_POSITIONS_Y;

-    // initial edges (16 bytes) = 32K [ints/floats] = 128K
-    static final int INITIAL_EDGES_CAPACITY = INITIAL_ARRAY_16K << 3;
-
     public static final int WIND_EVEN_ODD = 0;
     public static final int WIND_NON_ZERO = 1;

@@ -114,11 +128,17 @@
     public static final int OFF_F_CURX  = 0;
     public static final int OFF_SLOPE   = OFF_F_CURX + SIZE;
     // integer values:
+    public static final int OFF_CURX    = 0;
+    public static final int OFF_ERROR   = OFF_CURX + SIZE;
+
     public static final int OFF_NEXT    = OFF_SLOPE + SIZE;
     public static final int OFF_YMAX_OR = OFF_NEXT + SIZE;

+    public static final int OFF_BUMP_X  = OFF_YMAX_OR + SIZE;
+    public static final int OFF_BUMP_ERR= OFF_BUMP_X + SIZE;
+
     // size of one edge in bytes
-    public static final int SIZEOF_EDGE_BYTES = OFF_YMAX_OR + SIZE;
+    public static final int SIZEOF_EDGE_BYTES = ((USE_FP) ? OFF_BUMP_ERR :
OFF_YMAX_OR) + SIZE;

     // curve break into lines
     // cubic bind length (dx or dy) = 20 to decrement step
@@ -175,6 +195,7 @@
     private final int[] edgePtrs_initial  = new int[INITIAL_SMALL_ARRAY +
1]; // 4K
     // merge sort initial arrays (large enough to satisfy most usages)
(1024)
     private final int[] aux_crossings_initial = new
int[INITIAL_SMALL_ARRAY]; // 4K
+    // +1 to avoid recycling in Helpers.widenArray()
     private final int[] aux_edgePtrs_initial  = new
int[INITIAL_SMALL_ARRAY + 1]; // 4K

 //////////////////////////////////////////////////////////////////////////////
@@ -344,14 +365,26 @@

         /* TODO: improve accuracy using correct float rounding to int
         ie use ceil(x - 0.5f) */
+        float y1_cor;
+        int firstCrossing, lastCrossing;
+        if (USE_CORRECT_RND) {
+        // convert subpixel coordinates (float) into pixel positions (int)
+        // upper integer (inclusive)
+        y1_cor = y1 - 0.5f;
+        firstCrossing = Math.max(FloatMath.ceil(y1 - 0.5f), _boundsMinY);

+        // note: use boundsMaxY (last Y exclusive) to compute correct
coverage
+        // upper integer (exclusive ?)
+        lastCrossing  = Math.min(FloatMath.ceil(y2 - 0.5f),  boundsMaxY);
+        } else {
         // convert subpixel coordinates (float) into pixel positions (int)
         // upper integer (inclusive)
-        final int firstCrossing = Math.max(FloatMath.ceil(y1),
_boundsMinY);
+        firstCrossing = Math.max(FloatMath.ceil(y1), _boundsMinY);

         // note: use boundsMaxY (last Y exclusive) to compute correct
coverage
         // upper integer (exclusive ?)
-        final int lastCrossing  = Math.min(FloatMath.ceil(y2),
boundsMaxY);
+        lastCrossing  = Math.min(FloatMath.ceil(y2),  boundsMaxY);
+        }

         /* skip horizontal lines in pixel space and clip edges
            out of y range [boundsMinY; boundsMaxY] */
@@ -399,6 +432,8 @@
         final int edgePtr = _edges.used;

         if (_edges.length < edgePtr + _SIZEOF_EDGE_BYTES) {
+            // suppose _edges.length > _SIZEOF_EDGE_BYTES
+            // so doubling size is enough to add needed bytes
             // double size:
             final int edgeNewSize = edgePtr << 1;
             if (doStats) {
@@ -412,8 +447,54 @@
         final long    addr   = _edges.address + edgePtr;

         // float values:
+        if (USE_CORRECT_RND) {
+            if (USE_FP) {
+            // First, how far does y bump to get to next HPC?
+            // final float ystartbump = firstCrossing - y1 + 0.5f;
+            // Now, bump the float x coordinate to get X sample at that
HPC.
+//            x1 += (firstCrossing - y1 + 0.5f) * slope;
+            final float x1_cor = x1 - 0.5f + (firstCrossing - y1_cor) *
slope;
+            // Now calculate the integer coordinate that such a span
starts at.
+            // NOTE: Span inclusion is based on vertical pixel centers
(VPC).
+            // istartx = (jint) ceil(x0 - 0.5f);
+//            final int istartx = FloatMath.ceil(x1_cor - 0.5f);
+            int istartx = FloatMath.ceil(x1_cor);
+            _unsafe.putInt(addr,                istartx);
+
+            // Finally, find out how far the x coordinate can go before
next VPC.
+            // error = FRACTTOJINT(x0 - (istartx - 0.5f));
+//            final int error = fractToInt(x1 - (istartx -
0.5f));
+//            final int error = (int) ((x1 - (istartx - 0.5f)) *
ERR_STEP_MAX_DBL);
+            istartx -= 1;
+            _unsafe.putInt(addr + OFF_ERROR,
+//                           (int) ((x1 - istartx + 0.5f) *
ERR_STEP_MAX_DBL));
+                           (int) ((x1_cor - istartx) *
ERR_STEP_MAX_DBL));
+
+            // What is the lower bound of the per-scanline change in the X
coord?
+            // bumpx = (jint) floor(slope);
+            final float floor_slope = FloatMath.floor(slope);
+//            final int bumpx = (int)floor_slope;
+            _unsafe.putInt(addr + OFF_BUMP_X,
+                           (int)floor_slope);
+
+            // What is the subpixel amount by which the bumpx is off?
+            // bumperr = FRACTTOJINT(slope - floor(slope));
+//            final int bumperr = fractToInt(slope - floor_slope);
+//            final int bumperr = (int) ((slope - floor_slope) *
ERR_STEP_MAX_DBL);
+            _unsafe.putInt(addr + OFF_BUMP_ERR,
+                           (int) ((slope - floor_slope) *
ERR_STEP_MAX_DBL));
+
+            } else {
+            // x1 + (firstCrossing + 0.5f - y1) * slope;
+            _unsafe.putFloat(addr,             x1 - 0.5f + (firstCrossing
- y1 + 0.5f) * slope);
+            }
+        } else {
         _unsafe.putFloat(addr,             x1 + (firstCrossing - y1) *
slope);
+        }
+
+        if (!USE_FP) {
         _unsafe.putFloat(addr + OFF_SLOPE, slope);
+        }


         // each bucket is a linked list. this method adds ptr to the
@@ -687,7 +768,7 @@
     // clean alpha array (zero filled)
     private int[] alphaLine;
     // 2048 (pixelsize) pixel large
-    private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY]; //
16K
+    private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY]; //
8K

     private void _endRendering(final int ymin, final int ymax) {

@@ -720,6 +801,12 @@
         final int _OFF_NEXT    = OFF_NEXT;
         final int _OFF_YMAX_OR = OFF_YMAX_OR;

+        final int _OFF_ERROR   = OFF_ERROR;
+        final int _OFF_BUMP_X  = OFF_BUMP_X;
+        final int _OFF_BUMP_ERR= OFF_BUMP_ERR;
+
+        final int _ERR_STEP_MAX= ERR_STEP_MAX;
+
         // unsafe I/O:
         final Unsafe _unsafe = unsafe;
         final long    addr0  = _edges.address;
@@ -754,7 +841,7 @@
         int bucketcount, i, j, ecur, lowx, highx;
         int cross, lastCross;
         float f_curx;
-        int x0, x1, tmp, sum, prev, curx, curxo, crorientation;
+        int x0, x1, tmp, sum, prev, curx, curxo, crorientation, err;
         int pix_x, pix_xmaxm1, pix_xmax;

         int low, high, mid, prevNumCrossings;
@@ -913,22 +1000,53 @@
                         // get the pointer to the edge
                         ecur = _edgePtrs[i];

-                        // random access so use unsafe:
-                        addr = addr0 + ecur; // ecur + OFF_F_CURX
-                        f_curx = _unsafe.getFloat(addr);
-
                         /* convert subpixel coordinates (float) into pixel
                             positions (int) for coming scanline */
                         /* note: it is faster to always update edges even
                            if it is removed from AEL for coming or last
scanline */
+
                         // random access so use unsafe:
+                        addr = addr0 + ecur; // ecur + OFF_F_CURX
+
+                        if (USE_FP) {
+                        // get current crossing and error:
+                        curx = _unsafe.getInt(addr);
+                        err  = _unsafe.getInt(addr + _OFF_ERROR);
+
+                        // update crossing with orientation at last bit:
+                        cross = (curx << 1)
+                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+
+                        // Increment x using DDA (fixed point):
+                        // x0 = seg->curx + seg->bumpx
+                        curx += _unsafe.getInt(addr + _OFF_BUMP_X);
+                        // err = seg->error + seg->bumperr
+                        err  += _unsafe.getInt(addr + _OFF_BUMP_ERR);
+                        // x0 -= (err >> 31);
+//                        curx -= (err >> 31);
+                        _unsafe.putInt(addr, curx - (err >> 31));
+
+                        // err &= ERRSTEP_MAX;
+//                        err &= _ERR_STEP_MAX;
+                        _unsafe.putInt(addr + _OFF_ERROR, err &
_ERR_STEP_MAX);
+
+                        } else {
+                        f_curx = _unsafe.getFloat(addr);
+                        // random access so use unsafe:
                         _unsafe.putFloat(addr,
                                          f_curx + _unsafe.getFloat(addr +

_OFF_SLOPE)); // ecur + _SLOPE

                         // update crossing ( x-coordinate + last bit =
orientation (0 or 1)):
+                        if (USE_CORRECT_RND) {
+                            // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
+                        cross = (FloatMath.ceil(f_curx) << 1)
+                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+                        } else {
                         cross = (((int) f_curx) << 1)
                                 | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+                        }
+                        }

                         if (doStats) {

RendererContext.stats.stat_rdr_crossings_updates
@@ -1008,22 +1126,53 @@
                         // get the pointer to the edge
                         ecur = _edgePtrs[i];

-                        // random access so use unsafe:
-                        addr = addr0 + ecur; // ecur + OFF_F_CURX
-                        f_curx = _unsafe.getFloat(addr);
-
                         /* convert subpixel coordinates (float) into pixel
                            positions (int) for coming scanline */
                         /* note: it is faster to always update edges even
                            if it is removed from AEL for coming or last
scanline */
+
                         // random access so use unsafe:
+                        addr = addr0 + ecur; // ecur + OFF_F_CURX
+
+                        if (USE_FP) {
+                        // get current crossing and error:
+                        curx = _unsafe.getInt(addr);
+                        err  = _unsafe.getInt(addr + _OFF_ERROR);
+
+                        // update crossing with orientation at last bit:
+                        cross = (curx << 1)
+                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+
+                        // Increment x using DDA (fixed point):
+                        // x0 = seg->curx + seg->bumpx
+                        curx += _unsafe.getInt(addr + _OFF_BUMP_X);
+                        // err = seg->error + seg->bumperr
+                        err  += _unsafe.getInt(addr + _OFF_BUMP_ERR);
+                        // x0 -= (err >> 31);
+//                        curx -= (err >> 31);
+                        _unsafe.putInt(addr, curx - (err >> 31));
+
+                        // err &= ERRSTEP_MAX;
+//                        err &= _ERR_STEP_MAX;
+                        _unsafe.putInt(addr + _OFF_ERROR, err &
_ERR_STEP_MAX);
+
+                        } else {
+                        f_curx = _unsafe.getFloat(addr);
+                        // random access so use unsafe:
                         _unsafe.putFloat(addr,
                                          f_curx + _unsafe.getFloat(addr +

_OFF_SLOPE)); // ecur + _SLOPE

                         // update crossing ( x-coordinate + last bit =
orientation (0 or 1)):
+                        if (USE_CORRECT_RND) {
+                            // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
+                        cross = (FloatMath.ceil(f_curx) << 1)
+                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+                        } else {
                         cross = (((int) f_curx) << 1)
                                 | _unsafe.getInt(addr + _OFF_YMAX_OR) &
0x1;
+                        }
+                        }

                         if (doStats) {

RendererContext.stats.stat_rdr_crossings_updates
@@ -1250,21 +1399,34 @@
         /* TODO: improve accuracy using correct float rounding to int
            ie use ceil(x - 0.5f) */

+        final int _boundsMinY = boundsMinY;
+        final int _boundsMaxY = boundsMaxY;
+
         // bounds as inclusive intervals
-        final int spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
-        final int spmaxX = Math.min(FloatMath.ceil(edgeMaxX), boundsMaxX -
1);
+        int spminX, spmaxX, spminY, spmaxY;
+        int maxY;

-        final int _boundsMinY = boundsMinY;
-        final int _boundsMaxYm1 = boundsMaxY - 1;
+        if (USE_CORRECT_RND) {
+        spminX = Math.max(FloatMath.ceil(edgeMinX - 0.5f), boundsMinX);
+        spmaxX = Math.min(FloatMath.ceil(edgeMaxX - 0.5f), boundsMaxX - 1);

-        final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
-        final int spmaxY;
-        int maxY = FloatMath.ceil(edgeMaxY);
-        if (maxY <= _boundsMaxYm1) {
+        spminY = Math.max(FloatMath.ceil(edgeMinY - 0.5f), _boundsMinY);
+
+        maxY = FloatMath.ceil(edgeMaxY - 0.5f);
+        } else {
+        spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
+        spmaxX = Math.min(FloatMath.ceil(edgeMaxX), boundsMaxX - 1);
+
+        spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
+
+        maxY = FloatMath.ceil(edgeMaxY);
+        }
+
+        if (maxY <= _boundsMaxY - 1) {
             spmaxY = maxY;
         } else {
-            spmaxY = _boundsMaxYm1;
-            maxY   = _boundsMaxYm1 + 1;
+            spmaxY = _boundsMaxY - 1;
+            maxY   = _boundsMaxY;
         }
         buckets_minY = spminY - _boundsMinY;
         buckets_maxY = maxY   - _boundsMinY;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150706/b2dbee9e/attachment-0001.html>