[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Wed Jul 8 01:19:34 UTC 2015

Hi Laurent,

Interesting numbers.  It was hard to read the formatting on the diff 
below, but I got the gist of what was happening.

Were the ceil(coord) measurements taken with the new ceil_int() code? 
For this case it might make sense to call ceil_int() directly since we 
can be pretty sure that the fp coordinate values are all in the integer 
range (since these are drawable-relative numbers).

Another technique to try would be to use longs which would involve a 
64-bit shift to get the integer part, but there is already a 32-bit 
shift to add the error overflow anyway.

			...jim

On 7/6/15 3:28 AM, Laurent Bourgès wrote:
> Jim,
>
> I have made the mentioned tests: it means I modified addLine() and
> endRendering methods:
>
> 1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces
> (FX) :
>
> Renderer.USE_CORRECT_RND=true
>
> The output images are different from Pisces ones but are now closer to
> Ductus ones = more accurate.
>
> Of course, it is slower up to 15% on the very complex map:
>
> REF:
>
> dc_boulder_2013-13-30-06-13-17.ser               1    93    112.996
> 113.297    112.805    0.507    111.459    113.508    93
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser         1    246    42.791
> 43.483    42.926    0.283    42.648    43.816    246
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    762.882
> 764.781    763.136    0.715    762.219    765.110    25
>
> test_z_625k.ser                                  1    61    168.745
> 169.238    168.780    0.216    168.423    169.417    61
>
> PROPER_ROUND:
>
> dc_boulder_2013-13-30-06-13-17.ser               1    90    115.722
> 116.187    115.756    0.196    115.497    116.691    90
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser         1    230    45.598
> 45.816    45.620    0.105    45.467    46.182    230
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    877.221
> 878.020    877.272    0.558    876.277    878.890    25
>
> test_z_625k.ser                                  1    60    173.377
> 173.729    173.411    0.191    173.108    174.188    60
>
>
> 2/ use fixed point approach (longer work) to only use integer maths in
> Marlin rendering loop (crossings):
>
> Renderer.USE_CORRECT_RND=true and Renderer.USE_FP=true
>
> I simply made a port of ShapeSpanIterator.c (bumpx, bumperr, error) as
> you can see below in the given patch.
>
> It works well and the output images are close to ductus too (hope to be
> equals to previous test).
>
> => faster (no float to int conversions ?)
>
> It is faster than previous test (float + proper round) but not faster
> yet than current Marlin (float + cast): ~ 5% slower max.
>
> USE_FP:
>
> dc_boulder_2013-13-30-06-13-17.ser               1    89    117.544
> 117.900    117.564    0.173    117.306    118.287    89
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser         1    231    45.338
> 45.502    45.347    0.126    45.155    46.458    231
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser         1    25    808.432
> 809.665    808.602    0.723    807.456    810.553    25
>
> test_z_625k.ser                                  1    61    170.808
> 171.272    170.886    0.231    170.566    171.789    61
>
> However, the performance gap is very small and it can be further
> optimized: remove Unsafe usage that is no more required:
>
> => edge array will then only contain int[] and Unsafe usage is no more
> necessary
>
>
> To conclude, these tests improved the output quality (better rounding)
> and the fixed-point approach  is promising: it is quite fast and allows
> to get rid of Unsafe usage => simpler / safe and edge array will use
> again array caches (like others).
>
> I will try going further during the week ...
>
> Laurent
>
>
> PS: Here is a (quick and dirty) patch on Renderer to *illustrate* my
> changes and let you see what I did:
>
> # This patch file was generated by NetBeans IDE
> # It uses platform neutral UTF-8 encoding and \n newlines.
> --- HEAD
> +++ Modified In Working Tree
> @@ -37,6 +37,23 @@
>   import sun.misc.Unsafe;
>
>   final class Renderer implements PathConsumer2D, MarlinConst {
> +
> +    final static boolean USE_CORRECT_RND = true;
> +
> +    final static boolean USE_FP = true && USE_CORRECT_RND;
> +
> +    /*
> +#define ERRSTEP_MAX     (0x7fffffff)
> +#define FRACTTOJINT(f)  ((jint) ((f) * (double) ERRSTEP_MAX))
> +    */
> +    final static int ERR_STEP_MAX = 0x7fffffff;
> +    final static double ERR_STEP_MAX_DBL = (double)ERR_STEP_MAX;
> +
> +    static int fractToInt(final float f) {
> +        return (int) (f * ERR_STEP_MAX_DBL);
> +    }
> +
> +
>       // unsafe reference
>       final static Unsafe unsafe;
>       // array offset
> @@ -102,9 +119,6 @@
>       static final int INITIAL_BUCKET_ARRAY
>           = INITIAL_PIXEL_DIM * SUBPIXEL_POSITIONS_Y;
>
> -    // initial edges (16 bytes) = 32K [ints/floats] = 128K
> -    static final int INITIAL_EDGES_CAPACITY = INITIAL_ARRAY_16K << 3;
> -
>       public static final int WIND_EVEN_ODD = 0;
>       public static final int WIND_NON_ZERO = 1;
>
> @@ -114,11 +128,17 @@
>       public static final int OFF_F_CURX  = 0;
>       public static final int OFF_SLOPE   = OFF_F_CURX + SIZE;
>       // integer values:
> +    public static final int OFF_CURX    = 0;
> +    public static final int OFF_ERROR   = OFF_CURX + SIZE;
> +
>       public static final int OFF_NEXT    = OFF_SLOPE + SIZE;
>       public static final int OFF_YMAX_OR = OFF_NEXT + SIZE;
>
> +    public static final int OFF_BUMP_X  = OFF_YMAX_OR + SIZE;
> +    public static final int OFF_BUMP_ERR= OFF_BUMP_X + SIZE;
> +
>       // size of one edge in bytes
> -    public static final int SIZEOF_EDGE_BYTES = OFF_YMAX_OR + SIZE;
> +    public static final int SIZEOF_EDGE_BYTES = ((USE_FP) ?
> OFF_BUMP_ERR : OFF_YMAX_OR) + SIZE;
>
>       // curve break into lines
>       // cubic bind length (dx or dy) = 20 to decrement step
> @@ -175,6 +195,7 @@
>       private final int[] edgePtrs_initial  = new
> int[INITIAL_SMALL_ARRAY + 1]; // 4K
>       // merge sort initial arrays (large enough to satisfy most usages)
> (1024)
>       private final int[] aux_crossings_initial = new
> int[INITIAL_SMALL_ARRAY]; // 4K
> +    // +1 to avoid recycling in Helpers.widenArray()
>       private final int[] aux_edgePtrs_initial  = new
> int[INITIAL_SMALL_ARRAY + 1]; // 4K
>
>   //////////////////////////////////////////////////////////////////////////////
> @@ -344,14 +365,26 @@
>
>           /* TODO: improve accuracy using correct float rounding to int
>           ie use ceil(x - 0.5f) */
> +        float y1_cor;
> +        int firstCrossing, lastCrossing;
> +        if (USE_CORRECT_RND) {
> +        // convert subpixel coordinates (float) into pixel positions (int)
> +        // upper integer (inclusive)
> +        y1_cor = y1 - 0.5f;
> +        firstCrossing = Math.max(FloatMath.ceil(y1 - 0.5f), _boundsMinY);
>
> +        // note: use boundsMaxY (last Y exclusive) to compute correct
> coverage
> +        // upper integer (exclusive ?)
> +        lastCrossing  = Math.min(FloatMath.ceil(y2 - 0.5f),  boundsMaxY);
> +        } else {
>           // convert subpixel coordinates (float) into pixel positions (int)
>           // upper integer (inclusive)
> -        final int firstCrossing = Math.max(FloatMath.ceil(y1),
> _boundsMinY);
> +        firstCrossing = Math.max(FloatMath.ceil(y1), _boundsMinY);
>
>           // note: use boundsMaxY (last Y exclusive) to compute correct
> coverage
>           // upper integer (exclusive ?)
> -        final int lastCrossing  = Math.min(FloatMath.ceil(y2),
> boundsMaxY);
> +        lastCrossing  = Math.min(FloatMath.ceil(y2),  boundsMaxY);
> +        }
>
>           /* skip horizontal lines in pixel space and clip edges
>              out of y range [boundsMinY; boundsMaxY] */
> @@ -399,6 +432,8 @@
>           final int edgePtr = _edges.used;
>
>           if (_edges.length < edgePtr + _SIZEOF_EDGE_BYTES) {
> +            // suppose _edges.length > _SIZEOF_EDGE_BYTES
> +            // so doubling size is enough to add needed bytes
>               // double size:
>               final int edgeNewSize = edgePtr << 1;
>               if (doStats) {
> @@ -412,8 +447,54 @@
>           final long    addr   = _edges.address + edgePtr;
>
>           // float values:
> +        if (USE_CORRECT_RND) {
> +            if (USE_FP) {
> +            // First, how far does y bump to get to next HPC?
> +            // final float ystartbump = firstCrossing - y1 + 0.5f;
> +            // Now, bump the float x coordinate to get X sample at that
> HPC.
> +//            x1 += (firstCrossing - y1 + 0.5f) * slope;
> +            final float x1_cor = x1 - 0.5f + (firstCrossing - y1_cor) *
> slope;
> +            // Now calculate the integer coordinate that such a span
> starts at.
> +            // NOTE: Span inclusion is based on vertical pixel centers
> (VPC).
> +            // istartx = (jint) ceil(x0 - 0.5f);
> +//            final int istartx = FloatMath.ceil(x1_cor - 0.5f);
> +            int istartx = FloatMath.ceil(x1_cor);
> +            _unsafe.putInt(addr,                istartx);
> +
> +            // Finally, find out how far the x coordinate can go before
> next VPC.
> +            // error = FRACTTOJINT(x0 - (istartx - 0.5f));
> +//            final int error = fractToInt(x1 - (istartx - 0.5f));
> +//            final int error = (int) ((x1 - (istartx - 0.5f)) *
> ERR_STEP_MAX_DBL);
> +            istartx -= 1;
> +            _unsafe.putInt(addr + OFF_ERROR,
> +//                           (int) ((x1 - istartx + 0.5f) *
> ERR_STEP_MAX_DBL));
> +                           (int) ((x1_cor - istartx) * ERR_STEP_MAX_DBL));
> +
> +            // What is the lower bound of the per-scanline change in
> the X coord?
> +            // bumpx = (jint) floor(slope);
> +            final float floor_slope = FloatMath.floor(slope);
> +//            final int bumpx = (int)floor_slope;
> +            _unsafe.putInt(addr + OFF_BUMP_X,
> +                           (int)floor_slope);
> +
> +            // What is the subpixel amount by which the bumpx is off?
> +            // bumperr = FRACTTOJINT(slope - floor(slope));
> +//            final int bumperr = fractToInt(slope - floor_slope);
> +//            final int bumperr = (int) ((slope - floor_slope) *
> ERR_STEP_MAX_DBL);
> +            _unsafe.putInt(addr + OFF_BUMP_ERR,
> +                           (int) ((slope - floor_slope) *
> ERR_STEP_MAX_DBL));
> +
> +            } else {
> +            // x1 + (firstCrossing + 0.5f - y1) * slope;
> +            _unsafe.putFloat(addr,             x1 - 0.5f +
> (firstCrossing - y1 + 0.5f) * slope);
> +            }
> +        } else {
>           _unsafe.putFloat(addr,             x1 + (firstCrossing - y1) *
> slope);
> +        }
> +
> +        if (!USE_FP) {
>           _unsafe.putFloat(addr + OFF_SLOPE, slope);
> +        }
>
>
>           // each bucket is a linked list. this method adds ptr to the
> @@ -687,7 +768,7 @@
>       // clean alpha array (zero filled)
>       private int[] alphaLine;
>       // 2048 (pixelsize) pixel large
> -    private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY];
> // 16K
> +    private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY];
> // 8K
>
>       private void _endRendering(final int ymin, final int ymax) {
>
> @@ -720,6 +801,12 @@
>           final int _OFF_NEXT    = OFF_NEXT;
>           final int _OFF_YMAX_OR = OFF_YMAX_OR;
>
> +        final int _OFF_ERROR   = OFF_ERROR;
> +        final int _OFF_BUMP_X  = OFF_BUMP_X;
> +        final int _OFF_BUMP_ERR= OFF_BUMP_ERR;
> +
> +        final int _ERR_STEP_MAX= ERR_STEP_MAX;
> +
>           // unsafe I/O:
>           final Unsafe _unsafe = unsafe;
>           final long    addr0  = _edges.address;
> @@ -754,7 +841,7 @@
>           int bucketcount, i, j, ecur, lowx, highx;
>           int cross, lastCross;
>           float f_curx;
> -        int x0, x1, tmp, sum, prev, curx, curxo, crorientation;
> +        int x0, x1, tmp, sum, prev, curx, curxo, crorientation, err;
>           int pix_x, pix_xmaxm1, pix_xmax;
>
>           int low, high, mid, prevNumCrossings;
> @@ -913,22 +1000,53 @@
>                           // get the pointer to the edge
>                           ecur = _edgePtrs[i];
>
> -                        // random access so use unsafe:
> -                        addr = addr0 + ecur; // ecur + OFF_F_CURX
> -                        f_curx = _unsafe.getFloat(addr);
> -
>                           /* convert subpixel coordinates (float) into pixel
>                               positions (int) for coming scanline */
>                           /* note: it is faster to always update edges even
>                              if it is removed from AEL for coming or
> last scanline */
> +
>                           // random access so use unsafe:
> +                        addr = addr0 + ecur; // ecur + OFF_F_CURX
> +
> +                        if (USE_FP) {
> +                        // get current crossing and error:
> +                        curx = _unsafe.getInt(addr);
> +                        err  = _unsafe.getInt(addr + _OFF_ERROR);
> +
> +                        // update crossing with orientation at last bit:
> +                        cross = (curx << 1)
> +                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +
> +                        // Increment x using DDA (fixed point):
> +                        // x0 = seg->curx + seg->bumpx
> +                        curx += _unsafe.getInt(addr + _OFF_BUMP_X);
> +                        // err = seg->error + seg->bumperr
> +                        err  += _unsafe.getInt(addr + _OFF_BUMP_ERR);
> +                        // x0 -= (err >> 31);
> +//                        curx -= (err >> 31);
> +                        _unsafe.putInt(addr, curx - (err >> 31));
> +
> +                        // err &= ERRSTEP_MAX;
> +//                        err &= _ERR_STEP_MAX;
> +                        _unsafe.putInt(addr + _OFF_ERROR, err &
> _ERR_STEP_MAX);
> +
> +                        } else {
> +                        f_curx = _unsafe.getFloat(addr);
> +                        // random access so use unsafe:
>                           _unsafe.putFloat(addr,
>                                            f_curx + _unsafe.getFloat(addr +
>
> _OFF_SLOPE)); // ecur + _SLOPE
>
>                           // update crossing ( x-coordinate + last bit =
> orientation (0 or 1)):
> +                        if (USE_CORRECT_RND) {
> +                            // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
> +                        cross = (FloatMath.ceil(f_curx) << 1)
> +                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +                        } else {
>                           cross = (((int) f_curx) << 1)
>                                   | _unsafe.getInt(addr + _OFF_YMAX_OR)
> & 0x1;
> +                        }
> +                        }
>
>                           if (doStats) {
>
> RendererContext.stats.stat_rdr_crossings_updates
> @@ -1008,22 +1126,53 @@
>                           // get the pointer to the edge
>                           ecur = _edgePtrs[i];
>
> -                        // random access so use unsafe:
> -                        addr = addr0 + ecur; // ecur + OFF_F_CURX
> -                        f_curx = _unsafe.getFloat(addr);
> -
>                           /* convert subpixel coordinates (float) into pixel
>                              positions (int) for coming scanline */
>                           /* note: it is faster to always update edges even
>                              if it is removed from AEL for coming or
> last scanline */
> +
>                           // random access so use unsafe:
> +                        addr = addr0 + ecur; // ecur + OFF_F_CURX
> +
> +                        if (USE_FP) {
> +                        // get current crossing and error:
> +                        curx = _unsafe.getInt(addr);
> +                        err  = _unsafe.getInt(addr + _OFF_ERROR);
> +
> +                        // update crossing with orientation at last bit:
> +                        cross = (curx << 1)
> +                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +
> +                        // Increment x using DDA (fixed point):
> +                        // x0 = seg->curx + seg->bumpx
> +                        curx += _unsafe.getInt(addr + _OFF_BUMP_X);
> +                        // err = seg->error + seg->bumperr
> +                        err  += _unsafe.getInt(addr + _OFF_BUMP_ERR);
> +                        // x0 -= (err >> 31);
> +//                        curx -= (err >> 31);
> +                        _unsafe.putInt(addr, curx - (err >> 31));
> +
> +                        // err &= ERRSTEP_MAX;
> +//                        err &= _ERR_STEP_MAX;
> +                        _unsafe.putInt(addr + _OFF_ERROR, err &
> _ERR_STEP_MAX);
> +
> +                        } else {
> +                        f_curx = _unsafe.getFloat(addr);
> +                        // random access so use unsafe:
>                           _unsafe.putFloat(addr,
>                                            f_curx + _unsafe.getFloat(addr +
>
> _OFF_SLOPE)); // ecur + _SLOPE
>
>                           // update crossing ( x-coordinate + last bit =
> orientation (0 or 1)):
> +                        if (USE_CORRECT_RND) {
> +                            // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
> +                        cross = (FloatMath.ceil(f_curx) << 1)
> +                                | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +                        } else {
>                           cross = (((int) f_curx) << 1)
>                                   | _unsafe.getInt(addr + _OFF_YMAX_OR)
> & 0x1;
> +                        }
> +                        }
>
>                           if (doStats) {
>
> RendererContext.stats.stat_rdr_crossings_updates
> @@ -1250,21 +1399,34 @@
>           /* TODO: improve accuracy using correct float rounding to int
>              ie use ceil(x - 0.5f) */
>
> +        final int _boundsMinY = boundsMinY;
> +        final int _boundsMaxY = boundsMaxY;
> +
>           // bounds as inclusive intervals
> -        final int spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
> -        final int spmaxX = Math.min(FloatMath.ceil(edgeMaxX),
> boundsMaxX - 1);
> +        int spminX, spmaxX, spminY, spmaxY;
> +        int maxY;
>
> -        final int _boundsMinY = boundsMinY;
> -        final int _boundsMaxYm1 = boundsMaxY - 1;
> +        if (USE_CORRECT_RND) {
> +        spminX = Math.max(FloatMath.ceil(edgeMinX - 0.5f), boundsMinX);
> +        spmaxX = Math.min(FloatMath.ceil(edgeMaxX - 0.5f), boundsMaxX - 1);
>
> -        final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
> -        final int spmaxY;
> -        int maxY = FloatMath.ceil(edgeMaxY);
> -        if (maxY <= _boundsMaxYm1) {
> +        spminY = Math.max(FloatMath.ceil(edgeMinY - 0.5f), _boundsMinY);
> +
> +        maxY = FloatMath.ceil(edgeMaxY - 0.5f);
> +        } else {
> +        spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
> +        spmaxX = Math.min(FloatMath.ceil(edgeMaxX), boundsMaxX - 1);
> +
> +        spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
> +
> +        maxY = FloatMath.ceil(edgeMaxY);
> +        }
> +
> +        if (maxY <= _boundsMaxY - 1) {
>               spmaxY = maxY;
>           } else {
> -            spmaxY = _boundsMaxYm1;
> -            maxY   = _boundsMaxYm1 + 1;
> +            spmaxY = _boundsMaxY - 1;
> +            maxY   = _boundsMaxY;
>           }
>           buckets_minY = spminY - _boundsMinY;
>           buckets_maxY = maxY   - _boundsMinY;
>
>