[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3
Jim Graham
james.graham at oracle.com
Wed Jul 8 01:19:34 UTC 2015
Hi Laurent,
Interesting numbers. It was hard to read the formatting on the diff
below, but I got the gist of what was happening.
Were the ceil(coord) measurements taken with the new ceil_int() code?
For this case it might make sense to call ceil_int() directly since we
can be pretty sure that the fp coordinate values are all in the integer
range (since these are drawable-relative numbers).
Another technique to try would be to use longs which would involve a
64-bit shift to get the integer part, but there is already a 32-bit
shift to add the error overflow anyway.
...jim
On 7/6/15 3:28 AM, Laurent Bourgès wrote:
> Jim,
>
> I have made the mentioned tests: it means I modified addLine() and
> endRendering methods:
>
> 1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces
> (FX) :
>
> Renderer.USE_CORRECT_RND=true
>
> The output images are different from Pisces ones but are now closer to
> Ductus ones = more accurate.
>
> Of course, it is slower up to 15% on the very complex map:
>
> REF:
>
> dc_boulder_2013-13-30-06-13-17.ser 1 93 112.996
> 113.297 112.805 0.507 111.459 113.508 93
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser 1 246 42.791
> 43.483 42.926 0.283 42.648 43.816 246
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 762.882
> 764.781 763.136 0.715 762.219 765.110 25
>
> test_z_625k.ser 1 61 168.745
> 169.238 168.780 0.216 168.423 169.417 61
>
> PROPER_ROUND:
>
> dc_boulder_2013-13-30-06-13-17.ser 1 90 115.722
> 116.187 115.756 0.196 115.497 116.691 90
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser 1 230 45.598
> 45.816 45.620 0.105 45.467 46.182 230
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 877.221
> 878.020 877.272 0.558 876.277 878.890 25
>
> test_z_625k.ser 1 60 173.377
> 173.729 173.411 0.191 173.108 174.188 60
>
>
> 2/ use fixed point approach (longer work) to only use integer maths in
> Marlin rendering loop (crossings):
>
> Renderer.USE_CORRECT_RND=true and Renderer.USE_FP=true
>
> I simply made a port of ShapeSpanIterator.c (bumpx, bumperr, error) as
> you can see below in the given patch.
>
> It works well and the output images are close to ductus too (hope to be
> equals to previous test).
>
> => faster (no float to int conversions ?)
>
> It is faster than previous test (float + proper round) but not faster
> yet than current Marlin (float + cast): ~ 5% slower max.
>
> USE_FP:
>
> dc_boulder_2013-13-30-06-13-17.ser 1 89 117.544
> 117.900 117.564 0.173 117.306 118.287 89
>
> dc_shp_alllayers_2013-00-30-07-00-43.ser 1 231 45.338
> 45.502 45.347 0.126 45.155 46.458 231
>
> dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 808.432
> 809.665 808.602 0.723 807.456 810.553 25
>
> test_z_625k.ser 1 61 170.808
> 171.272 170.886 0.231 170.566 171.789 61
>
> However, the performance gap is very small and it can be further
> optimized: remove Unsafe usage that is no more required:
>
> => edge array will then only contain int[] and Unsafe usage is no more
> necessary
>
>
> To conclude, these tests improved the output quality (better rounding)
> and the fixed-point approach is promising: it is quite fast and allows
> to get rid of Unsafe usage => simpler / safe and edge array will use
> again array caches (like others).
>
> I will try going further during the week ...
>
> Laurent
>
>
> PS: Here is a (quick and dirty) patch on Renderer to *illustrate* my
> changes and let you see what I did:
>
> # This patch file was generated by NetBeans IDE
> # It uses platform neutral UTF-8 encoding and \n newlines.
> --- HEAD
> +++ Modified In Working Tree
> @@ -37,6 +37,23 @@
> import sun.misc.Unsafe;
>
> final class Renderer implements PathConsumer2D, MarlinConst {
> +
> + final static boolean USE_CORRECT_RND = true;
> +
> + final static boolean USE_FP = true && USE_CORRECT_RND;
> +
> + /*
> +#define ERRSTEP_MAX (0x7fffffff)
> +#define FRACTTOJINT(f) ((jint) ((f) * (double) ERRSTEP_MAX))
> + */
> + final static int ERR_STEP_MAX = 0x7fffffff;
> + final static double ERR_STEP_MAX_DBL = (double)ERR_STEP_MAX;
> +
> + static int fractToInt(final float f) {
> + return (int) (f * ERR_STEP_MAX_DBL);
> + }
> +
> +
> // unsafe reference
> final static Unsafe unsafe;
> // array offset
> @@ -102,9 +119,6 @@
> static final int INITIAL_BUCKET_ARRAY
> = INITIAL_PIXEL_DIM * SUBPIXEL_POSITIONS_Y;
>
> - // initial edges (16 bytes) = 32K [ints/floats] = 128K
> - static final int INITIAL_EDGES_CAPACITY = INITIAL_ARRAY_16K << 3;
> -
> public static final int WIND_EVEN_ODD = 0;
> public static final int WIND_NON_ZERO = 1;
>
> @@ -114,11 +128,17 @@
> public static final int OFF_F_CURX = 0;
> public static final int OFF_SLOPE = OFF_F_CURX + SIZE;
> // integer values:
> + public static final int OFF_CURX = 0;
> + public static final int OFF_ERROR = OFF_CURX + SIZE;
> +
> public static final int OFF_NEXT = OFF_SLOPE + SIZE;
> public static final int OFF_YMAX_OR = OFF_NEXT + SIZE;
>
> + public static final int OFF_BUMP_X = OFF_YMAX_OR + SIZE;
> + public static final int OFF_BUMP_ERR= OFF_BUMP_X + SIZE;
> +
> // size of one edge in bytes
> - public static final int SIZEOF_EDGE_BYTES = OFF_YMAX_OR + SIZE;
> + public static final int SIZEOF_EDGE_BYTES = ((USE_FP) ?
> OFF_BUMP_ERR : OFF_YMAX_OR) + SIZE;
>
> // curve break into lines
> // cubic bind length (dx or dy) = 20 to decrement step
> @@ -175,6 +195,7 @@
> private final int[] edgePtrs_initial = new
> int[INITIAL_SMALL_ARRAY + 1]; // 4K
> // merge sort initial arrays (large enough to satisfy most usages)
> (1024)
> private final int[] aux_crossings_initial = new
> int[INITIAL_SMALL_ARRAY]; // 4K
> + // +1 to avoid recycling in Helpers.widenArray()
> private final int[] aux_edgePtrs_initial = new
> int[INITIAL_SMALL_ARRAY + 1]; // 4K
>
> //////////////////////////////////////////////////////////////////////////////
> @@ -344,14 +365,26 @@
>
> /* TODO: improve accuracy using correct float rounding to int
> ie use ceil(x - 0.5f) */
> + float y1_cor;
> + int firstCrossing, lastCrossing;
> + if (USE_CORRECT_RND) {
> + // convert subpixel coordinates (float) into pixel positions (int)
> + // upper integer (inclusive)
> + y1_cor = y1 - 0.5f;
> + firstCrossing = Math.max(FloatMath.ceil(y1 - 0.5f), _boundsMinY);
>
> + // note: use boundsMaxY (last Y exclusive) to compute correct
> coverage
> + // upper integer (exclusive ?)
> + lastCrossing = Math.min(FloatMath.ceil(y2 - 0.5f), boundsMaxY);
> + } else {
> // convert subpixel coordinates (float) into pixel positions (int)
> // upper integer (inclusive)
> - final int firstCrossing = Math.max(FloatMath.ceil(y1),
> _boundsMinY);
> + firstCrossing = Math.max(FloatMath.ceil(y1), _boundsMinY);
>
> // note: use boundsMaxY (last Y exclusive) to compute correct
> coverage
> // upper integer (exclusive ?)
> - final int lastCrossing = Math.min(FloatMath.ceil(y2),
> boundsMaxY);
> + lastCrossing = Math.min(FloatMath.ceil(y2), boundsMaxY);
> + }
>
> /* skip horizontal lines in pixel space and clip edges
> out of y range [boundsMinY; boundsMaxY] */
> @@ -399,6 +432,8 @@
> final int edgePtr = _edges.used;
>
> if (_edges.length < edgePtr + _SIZEOF_EDGE_BYTES) {
> + // suppose _edges.length > _SIZEOF_EDGE_BYTES
> + // so doubling size is enough to add needed bytes
> // double size:
> final int edgeNewSize = edgePtr << 1;
> if (doStats) {
> @@ -412,8 +447,54 @@
> final long addr = _edges.address + edgePtr;
>
> // float values:
> + if (USE_CORRECT_RND) {
> + if (USE_FP) {
> + // First, how far does y bump to get to next HPC?
> + // final float ystartbump = firstCrossing - y1 + 0.5f;
> + // Now, bump the float x coordinate to get X sample at that
> HPC.
> +// x1 += (firstCrossing - y1 + 0.5f) * slope;
> + final float x1_cor = x1 - 0.5f + (firstCrossing - y1_cor) *
> slope;
> + // Now calculate the integer coordinate that such a span
> starts at.
> + // NOTE: Span inclusion is based on vertical pixel centers
> (VPC).
> + // istartx = (jint) ceil(x0 - 0.5f);
> +// final int istartx = FloatMath.ceil(x1_cor - 0.5f);
> + int istartx = FloatMath.ceil(x1_cor);
> + _unsafe.putInt(addr, istartx);
> +
> + // Finally, find out how far the x coordinate can go before
> next VPC.
> + // error = FRACTTOJINT(x0 - (istartx - 0.5f));
> +// final int error = fractToInt(x1 - (istartx - 0.5f));
> +// final int error = (int) ((x1 - (istartx - 0.5f)) *
> ERR_STEP_MAX_DBL);
> + istartx -= 1;
> + _unsafe.putInt(addr + OFF_ERROR,
> +// (int) ((x1 - istartx + 0.5f) *
> ERR_STEP_MAX_DBL));
> + (int) ((x1_cor - istartx) * ERR_STEP_MAX_DBL));
> +
> + // What is the lower bound of the per-scanline change in
> the X coord?
> + // bumpx = (jint) floor(slope);
> + final float floor_slope = FloatMath.floor(slope);
> +// final int bumpx = (int)floor_slope;
> + _unsafe.putInt(addr + OFF_BUMP_X,
> + (int)floor_slope);
> +
> + // What is the subpixel amount by which the bumpx is off?
> + // bumperr = FRACTTOJINT(slope - floor(slope));
> +// final int bumperr = fractToInt(slope - floor_slope);
> +// final int bumperr = (int) ((slope - floor_slope) *
> ERR_STEP_MAX_DBL);
> + _unsafe.putInt(addr + OFF_BUMP_ERR,
> + (int) ((slope - floor_slope) *
> ERR_STEP_MAX_DBL));
> +
> + } else {
> + // x1 + (firstCrossing + 0.5f - y1) * slope;
> + _unsafe.putFloat(addr, x1 - 0.5f +
> (firstCrossing - y1 + 0.5f) * slope);
> + }
> + } else {
> _unsafe.putFloat(addr, x1 + (firstCrossing - y1) *
> slope);
> + }
> +
> + if (!USE_FP) {
> _unsafe.putFloat(addr + OFF_SLOPE, slope);
> + }
>
>
> // each bucket is a linked list. this method adds ptr to the
> @@ -687,7 +768,7 @@
> // clean alpha array (zero filled)
> private int[] alphaLine;
> // 2048 (pixelsize) pixel large
> - private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY];
> // 16K
> + private final int[] alphaLine_initial = new int[INITIAL_AA_ARRAY];
> // 8K
>
> private void _endRendering(final int ymin, final int ymax) {
>
> @@ -720,6 +801,12 @@
> final int _OFF_NEXT = OFF_NEXT;
> final int _OFF_YMAX_OR = OFF_YMAX_OR;
>
> + final int _OFF_ERROR = OFF_ERROR;
> + final int _OFF_BUMP_X = OFF_BUMP_X;
> + final int _OFF_BUMP_ERR= OFF_BUMP_ERR;
> +
> + final int _ERR_STEP_MAX= ERR_STEP_MAX;
> +
> // unsafe I/O:
> final Unsafe _unsafe = unsafe;
> final long addr0 = _edges.address;
> @@ -754,7 +841,7 @@
> int bucketcount, i, j, ecur, lowx, highx;
> int cross, lastCross;
> float f_curx;
> - int x0, x1, tmp, sum, prev, curx, curxo, crorientation;
> + int x0, x1, tmp, sum, prev, curx, curxo, crorientation, err;
> int pix_x, pix_xmaxm1, pix_xmax;
>
> int low, high, mid, prevNumCrossings;
> @@ -913,22 +1000,53 @@
> // get the pointer to the edge
> ecur = _edgePtrs[i];
>
> - // random access so use unsafe:
> - addr = addr0 + ecur; // ecur + OFF_F_CURX
> - f_curx = _unsafe.getFloat(addr);
> -
> /* convert subpixel coordinates (float) into pixel
> positions (int) for coming scanline */
> /* note: it is faster to always update edges even
> if it is removed from AEL for coming or
> last scanline */
> +
> // random access so use unsafe:
> + addr = addr0 + ecur; // ecur + OFF_F_CURX
> +
> + if (USE_FP) {
> + // get current crossing and error:
> + curx = _unsafe.getInt(addr);
> + err = _unsafe.getInt(addr + _OFF_ERROR);
> +
> + // update crossing with orientation at last bit:
> + cross = (curx << 1)
> + | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +
> + // Increment x using DDA (fixed point):
> + // x0 = seg->curx + seg->bumpx
> + curx += _unsafe.getInt(addr + _OFF_BUMP_X);
> + // err = seg->error + seg->bumperr
> + err += _unsafe.getInt(addr + _OFF_BUMP_ERR);
> + // x0 -= (err >> 31);
> +// curx -= (err >> 31);
> + _unsafe.putInt(addr, curx - (err >> 31));
> +
> + // err &= ERRSTEP_MAX;
> +// err &= _ERR_STEP_MAX;
> + _unsafe.putInt(addr + _OFF_ERROR, err &
> _ERR_STEP_MAX);
> +
> + } else {
> + f_curx = _unsafe.getFloat(addr);
> + // random access so use unsafe:
> _unsafe.putFloat(addr,
> f_curx + _unsafe.getFloat(addr +
>
> _OFF_SLOPE)); // ecur + _SLOPE
>
> // update crossing ( x-coordinate + last bit =
> orientation (0 or 1)):
> + if (USE_CORRECT_RND) {
> + // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
> + cross = (FloatMath.ceil(f_curx) << 1)
> + | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> + } else {
> cross = (((int) f_curx) << 1)
> | _unsafe.getInt(addr + _OFF_YMAX_OR)
> & 0x1;
> + }
> + }
>
> if (doStats) {
>
> RendererContext.stats.stat_rdr_crossings_updates
> @@ -1008,22 +1126,53 @@
> // get the pointer to the edge
> ecur = _edgePtrs[i];
>
> - // random access so use unsafe:
> - addr = addr0 + ecur; // ecur + OFF_F_CURX
> - f_curx = _unsafe.getFloat(addr);
> -
> /* convert subpixel coordinates (float) into pixel
> positions (int) for coming scanline */
> /* note: it is faster to always update edges even
> if it is removed from AEL for coming or
> last scanline */
> +
> // random access so use unsafe:
> + addr = addr0 + ecur; // ecur + OFF_F_CURX
> +
> + if (USE_FP) {
> + // get current crossing and error:
> + curx = _unsafe.getInt(addr);
> + err = _unsafe.getInt(addr + _OFF_ERROR);
> +
> + // update crossing with orientation at last bit:
> + cross = (curx << 1)
> + | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> +
> + // Increment x using DDA (fixed point):
> + // x0 = seg->curx + seg->bumpx
> + curx += _unsafe.getInt(addr + _OFF_BUMP_X);
> + // err = seg->error + seg->bumperr
> + err += _unsafe.getInt(addr + _OFF_BUMP_ERR);
> + // x0 -= (err >> 31);
> +// curx -= (err >> 31);
> + _unsafe.putInt(addr, curx - (err >> 31));
> +
> + // err &= ERRSTEP_MAX;
> +// err &= _ERR_STEP_MAX;
> + _unsafe.putInt(addr + _OFF_ERROR, err &
> _ERR_STEP_MAX);
> +
> + } else {
> + f_curx = _unsafe.getFloat(addr);
> + // random access so use unsafe:
> _unsafe.putFloat(addr,
> f_curx + _unsafe.getFloat(addr +
>
> _OFF_SLOPE)); // ecur + _SLOPE
>
> // update crossing ( x-coordinate + last bit =
> orientation (0 or 1)):
> + if (USE_CORRECT_RND) {
> + // ceil(curx - 0.5f) : TODO: push - 0.5 in edge
> + cross = (FloatMath.ceil(f_curx) << 1)
> + | _unsafe.getInt(addr + _OFF_YMAX_OR) &
> 0x1;
> + } else {
> cross = (((int) f_curx) << 1)
> | _unsafe.getInt(addr + _OFF_YMAX_OR)
> & 0x1;
> + }
> + }
>
> if (doStats) {
>
> RendererContext.stats.stat_rdr_crossings_updates
> @@ -1250,21 +1399,34 @@
> /* TODO: improve accuracy using correct float rounding to int
> ie use ceil(x - 0.5f) */
>
> + final int _boundsMinY = boundsMinY;
> + final int _boundsMaxY = boundsMaxY;
> +
> // bounds as inclusive intervals
> - final int spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
> - final int spmaxX = Math.min(FloatMath.ceil(edgeMaxX),
> boundsMaxX - 1);
> + int spminX, spmaxX, spminY, spmaxY;
> + int maxY;
>
> - final int _boundsMinY = boundsMinY;
> - final int _boundsMaxYm1 = boundsMaxY - 1;
> + if (USE_CORRECT_RND) {
> + spminX = Math.max(FloatMath.ceil(edgeMinX - 0.5f), boundsMinX);
> + spmaxX = Math.min(FloatMath.ceil(edgeMaxX - 0.5f), boundsMaxX - 1);
>
> - final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
> - final int spmaxY;
> - int maxY = FloatMath.ceil(edgeMaxY);
> - if (maxY <= _boundsMaxYm1) {
> + spminY = Math.max(FloatMath.ceil(edgeMinY - 0.5f), _boundsMinY);
> +
> + maxY = FloatMath.ceil(edgeMaxY - 0.5f);
> + } else {
> + spminX = Math.max(FloatMath.ceil(edgeMinX), boundsMinX);
> + spmaxX = Math.min(FloatMath.ceil(edgeMaxX), boundsMaxX - 1);
> +
> + spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
> +
> + maxY = FloatMath.ceil(edgeMaxY);
> + }
> +
> + if (maxY <= _boundsMaxY - 1) {
> spmaxY = maxY;
> } else {
> - spmaxY = _boundsMaxYm1;
> - maxY = _boundsMaxYm1 + 1;
> + spmaxY = _boundsMaxY - 1;
> + maxY = _boundsMaxY;
> }
> buckets_minY = spminY - _boundsMinY;
> buckets_maxY = maxY - _boundsMinY;
>
>
More information about the graphics-rasterizer-dev
mailing list