[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Wed Jul 8 01:23:58 UTC 2015

Hi Laurent,

I feel as if this much effort put into creating fast alternatives for 
these operations is an interesting academic pursuit, but we might be 
better served by analyzing how we use floor/ceil and finding was to 
reduce those or find more targeted algorithms for those on a case by 
case basis - if they are in an inner loop.  The foo_int() methods are 
the ones that I'm mainly interested as they pertain to the inner loop of 
the rasterizer - on the other hand we might be able to avoid them with 
fixed point arithmetic instead.

With regard to using them in the normalizing iterator - are the target 
customers leaving normalization enabled for their shape rendering?  For 
cases like map rendering and other typical server rendering issues I 
would think that they would want it off for more accurate paths, and 
also to get rid of some unnecessary pre-processing that was only 
originally meant to be a band-aid for developers who were expecting 
drawRect(x,y,w-1,h-1) to touch the row of pixels around the inside of 
that rectangle.  If we get rid of normalization there are likely few 
other uses of floor/ceil in our rendering flow...

			...jim

On 7/3/15 1:51 PM, Laurent Bourgès wrote:
> Jim,
>
> Here is an updated webrev:
> http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/
>
> Changes:
> - enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now
> - renamed faster alternatives as int ceil_int(float) and float
> floor_int(float) that are faster in the integer domain
> - restored ceil_f / floor_f (float) methods that are strictly correct as
> (float) StrictMath.ceil/floor(double)
> - made FloatMath class and its methods public to be available for tests
> and maybe more general use in graphics / java2d ...
>
> It is still faster than previous FloatMath and Marlin is a bit faster too:
> see results at then end !
>
>
> Here are few comments on joe's proposal:
>
>     >> I could propose my implementations of float ceil/floor (float) that are
>     >> exactly giving the same results than (float)StrictMath.ceil/floor (double).
>     >> According to my benchmarks, it is 25% faster.
>
>     >
>     >
>     > I don't think we need to limit ourselves to either StrictMath or Math. We simply need something predictable that has properties which work for our needs.
>     >
>     I was just proposing the 2 methods float ceil/floor (float) (derived
>     from StrictMath) to be included the core libs if it is useful for
>     general use (25% faster).
>
> Joe, are you interested by ceil_f / floor_f variants (25% faster than
> StrictMath) ?
>
>>> So, you can *almost* get away with
>>>
>>> int ceil_returning_int(floor f) {
>>>       if (f > 0.0)
>>>           return - ((int)(-f))
>>>       else
>>>           return (int) f;
>>> }
>>>
>>> int floor_returning_int(floor f) {
>>>       if (f < 0.0)
>>>           return - ((int)(-f))
>>>       else
>>>           return (int) f;
>>> }
>>>
>>> I tried joe's proposal but it does not work:
>>> Round to zero is not equivalent to ceil or floor !
>>
>>
>> In what way do Joe's techniques fail?  Integer casts should be a truncate operation (is that what you refer to as "round to zero"?) and should be the same as floor() for non-negative numbers and -((int)(-v)) should be the same as floor for negative numbers...
>
>>>
>
> I tried and it does not work
>
> ceil (1.2)=2
> But (int)(-1.2)=-1 (round to zero).
> So the result is 1 and not 2 !
>
> That's why my variant adds/substract 1 !
> But it make infinity / nan handling more painful and a bit costly.
>
>
> Jim, I will next make tests:
>
> 1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX)
>
> 2/ use fixed point approach (longer work) to only use integer maths in
> Marlin rendering loop (crossings)
>
> => faster (no float to int conversions ?) but also more scalable on
> hyperThreading CPU ?
>
> => edge array will then only contain int[] and Unsafe usage is no more
> necessary
>
>
> Cheers,
>
> Laurent
>
>
> PS: Here are some benchmark results made on values only in the integer
> domain:
>
>>> JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18]
> floats = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2, -1.9,
> -0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23,
> 2.13422758E9, 1.37992608E8, 134758.4]
>
> strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
> -17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
> 1.37992608E8, 134759.0]
> floatMathCeil    = [-2134227584, -137992608, -134758, -131, -17, -1, 0,
> 0, 0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1,
> 2134227584, 137992608, 134759]
> FloatMathCeil_f  = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
> -17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
> 1.37992608E8, 134759.0]
>
> strictMathFloor_f   = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
> -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
> 1.37992608E8, 134758.0]
> floatMathFloor    = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
> -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
> 1.37992608E8, 134758.0]
> floatMathFloor_f  = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
> -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
> 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
> 1.37992608E8, 134758.0]
>
> - Benchmarks ---
> # Calib: run duration:  5 000 ms
> 4 threads, Tavg =      3,03 ns/op (σ =   0,02 ns/op), Total ops =
> 6616663415 [     3,07 (1633532581),      3,02 (1656839291),      3,01
> (1662195896),      3,01 (1664095647)]
> #
>
> #-------------------------------------------------------------
> *# StrictMathCeil_f*: run duration:  5 000 ms
> **float = *(float) StrictMath.ceil(f)
> *
>              1 threads, Tavg =    112,46 ns/op (σ =   0,00 ns/op), Total
> ops =     44462614 [   112,46 (44462614)]
> 2 threads, Tavg =    112,53 ns/op (σ =   0,20 ns/op), Total ops =
> 88864503 [   112,74 (44351706),    112,33 (44512797)]
>              3 threads, Tavg =    112,75 ns/op (σ =   0,31 ns/op), Total
> ops = 133042189 [   112,67 (44379882),    112,42 (44478562),    113,17
> (44183745)]
> *            4 threads, Tavg =    113,61 ns/op (σ =   1,18 ns/op),*
> Total ops =    176004512 [   115,63 (43242922),    113,27 (44144214),
> 112,59 (44409190),    113,01 (44208186)]
> #
>
> #-------------------------------------------------------------
> *# FloatMathCeil_f:* run duration:  5 000 ms
> *float = FloatMath.ceil_f(f)
> *
>              1 threads, Tavg =     85,42 ns/op (σ =   0,00 ns/op), Total
> ops =     58534818 [    85,42 (58534818)]
> 2 threads, Tavg =     85,56 ns/op (σ =   0,18 ns/op), Total ops =
> 116880361 [    85,74 (58318655),     85,38 (58561706)]
>              3 threads, Tavg =     85,49 ns/op (σ =   0,11 ns/op), Total
> ops = 175469910 [    85,64 (58386401),     85,42 (58535723),     85,40
> (58547786)]
> *            4 threads, Tavg =     86,10 ns/op (σ =   0,86 ns/op),
> *Total ops =    232739544 [    87,59 (57200792),     85,61
> (58519538),     85,47 (58617544),     85,79 (58401670)]
> #
>
> #-------------------------------------------------------------
> *# FloatMathCeil:* run duration:  5 000 ms
> *int = FloatMath.ceil(f)*
>
>              1 threads, Tavg =     56,72 ns/op (σ =   0,00 ns/op), Total
> ops =     88153017 [    56,72 (88153017)]
> 2 threads, Tavg =     56,90 ns/op (σ =   0,16 ns/op), Total ops =
> 175737994 [    57,06 (87626873),     56,75 (88111121)]
>              3 threads, Tavg =     56,82 ns/op (σ =   0,15 ns/op), Total
> ops = 264003134 [    57,02 (87684429),     56,76 (88087214),     56,67
> (88231491)]
> *            4 threads, Tavg =     57,16 ns/op (σ =   0,57 ns/op),*
> Total ops =    350060098 [    58,12 (86072473),     56,74
> (88161260),     56,68 (88251450),     57,12 (87574915)]
> #
>
> #-------------------------------------------------------------
> *# StrictMathFloor_f:* run duration:  5 000 ms
> *float = (float) StrictMath.floor(f)*
>
>              1 threads, Tavg =    108,69 ns/op (σ =   0,00 ns/op), Total
> ops =     46005419 [   108,69 (46005419)]
> 2 threads, Tavg =    108,87 ns/op (σ =   0,25 ns/op), Total ops =
> 91856264 [   109,11 (45824174),    108,62 (46032090)]
>              3 threads, Tavg =    108,66 ns/op (σ =   0,01 ns/op), Total
> ops = 138046291 [   108,65 (46019660),    108,68 (46008068),    108,65
> (46018563)]
> *            4 threads, Tavg =    109,99 ns/op (σ =   1,00 ns/op),*
> Total ops =    182162538 [   111,63 (44870853),    109,77 (45631259),
> 108,90 (45994047),    109,69 (45666379)]
> #
>
> #-------------------------------------------------------------
> *# FloatMathFloor_f: *run duration:  5 000 ms
> *float = FloatMath.floor_f(f)*
>
>              1 threads, Tavg =     79,60 ns/op (σ =   0,00 ns/op), Total
> ops =     62816917 [    79,60 (62816917)]
> 2 threads, Tavg =     79,44 ns/op (σ =   0,15 ns/op), Total ops =
> 125890579 [    79,58 (62827873),     79,29 (63062706)]
>              3 threads, Tavg =     79,38 ns/op (σ =   0,15 ns/op), Total
> ops = 188968096 [    79,59 (62823628),     79,23 (63107367),     79,32
> (63037101)]
> *            4 threads, Tavg =     79,88 ns/op (σ =   0,83 ns/op),*
> Total ops =    250828233 [    81,31 (61604026),     79,60
> (62930953),     79,32 (63149634),     79,33 (63143620)]
> #
>
> #-------------------------------------------------------------
> *# FloatMathFloor:* run duration:  5 000 ms
> *float = FloatMath.floor(f)*
>
>              1 threads, Tavg =     70,20 ns/op (σ =   0,00 ns/op), Total
> ops =     71226367 [    70,20 (71226367)]
> 2 threads, Tavg =     70,35 ns/op (σ =   0,16 ns/op), Total ops =
> 142141053 [    70,51 (70910131),     70,20 (71230922)]
>              3 threads, Tavg =     70,26 ns/op (σ =   0,08 ns/op), Total
> ops = 213504247 [    70,20 (71225449),     70,38 (71046834),     70,19
> (71231964)]
> *            4 threads, Tavg =     70,67 ns/op (σ =   0,60 ns/op),
> *Total ops =    283376128 [    70,24 (71279973),     70,58
> (70931050),     70,20 (71320272),     71,68 (69844833)]
> #