[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Fri Jul 3 20:51:47 UTC 2015

Jim,

Here is an updated webrev:
http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/

Changes:
- enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now
- renamed faster alternatives as int ceil_int(float) and float
floor_int(float) that are faster in the integer domain
- restored ceil_f / floor_f (float) methods that are strictly correct as
(float) StrictMath.ceil/floor(double)
- made FloatMath class and its methods public to be available for tests and
maybe more general use in graphics / java2d ...

It is still faster than previous FloatMath and Marlin is a bit faster too:
see results at then end !

Here are few comments on joe's proposal:

> >> I could propose my implementations of float ceil/floor (float) that are
> >> exactly giving the same results than (float)StrictMath.ceil/floor
> (double).
> >> According to my benchmarks, it is 25% faster.
>
>
> >
> > I don't think we need to limit ourselves to either StrictMath or Math.
> We simply need something predictable that has properties which work for our
> needs.
> >
> I was just proposing the 2 methods float ceil/floor (float) (derived from
> StrictMath) to be included the core libs if it is useful for general use
> (25% faster).
>
Joe, are you interested by ceil_f / floor_f variants (25% faster than
StrictMath) ?

>> So, you can *almost* get away with
>>
>> int ceil_returning_int(floor f) {
>>       if (f > 0.0)
>>           return - ((int)(-f))
>>       else
>>           return (int) f;
>> }
>>
>> int floor_returning_int(floor f) {
>>       if (f < 0.0)
>>           return - ((int)(-f))
>>       else
>>           return (int) f;
>> }
>>
>> I tried joe's proposal but it does not work:
>> Round to zero is not equivalent to ceil or floor !
>
>
> In what way do Joe's techniques fail?  Integer casts should be a truncate
operation (is that what you refer to as "round to zero"?) and should be the
same as floor() for non-negative numbers and -((int)(-v)) should be the
same as floor for negative numbers...
>>

I tried and it does not work

ceil (1.2)=2
But (int)(-1.2)=-1 (round to zero).
So the result is 1 and not 2 !

That's why my variant adds/substract 1 !
But it make infinity / nan handling more painful and a bit costly.

Jim, I will next make tests:

1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX)

2/ use fixed point approach (longer work) to only use integer maths in
Marlin rendering loop (crossings)

=> faster (no float to int conversions ?) but also more scalable on
hyperThreading CPU ?

=> edge array will then only contain int[] and Unsafe usage is no more
necessary

Cheers,
Laurent

PS: Here are some benchmark results made on values only in the integer
domain:

>> JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18]
floats       = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2,
-1.9, -0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23,
2.13422758E9, 1.37992608E8, 134758.4]

strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0, -17.0,
-1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0, 1.0, 3.0,
100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9, 1.37992608E8,
134759.0]
floatMathCeil    = [-2134227584, -137992608, -134758, -131, -17, -1, 0, 0,
0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1, 2134227584,
137992608, 134759]
FloatMathCeil_f  = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0, -17.0,
-1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0, 1.0, 3.0,
100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9, 1.37992608E8,
134759.0]

strictMathFloor_f   = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor    = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor_f  = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]

- Benchmarks ---
# Calib: run duration:  5 000 ms
            4 threads, Tavg =      3,03 ns/op (σ =   0,02 ns/op), Total ops
=   6616663415 [     3,07 (1633532581),      3,02 (1656839291),      3,01
(1662195896),      3,01 (1664095647)]
#

#-------------------------------------------------------------
*# StrictMathCeil_f*: run duration:  5 000 ms

*float = (float) StrictMath.ceil(f)*
            1 threads, Tavg =    112,46 ns/op (σ =   0,00 ns/op), Total ops
=     44462614 [   112,46 (44462614)]
            2 threads, Tavg =    112,53 ns/op (σ =   0,20 ns/op), Total ops
=     88864503 [   112,74 (44351706),    112,33 (44512797)]
            3 threads, Tavg =    112,75 ns/op (σ =   0,31 ns/op), Total ops
=    133042189 [   112,67 (44379882),    112,42 (44478562),    113,17
(44183745)]
*            4 threads, Tavg =    113,61 ns/op (σ =   1,18 ns/op),* Total
ops =    176004512 [   115,63 (43242922),    113,27 (44144214),    112,59
(44409190),    113,01 (44208186)]
#

#-------------------------------------------------------------
*# FloatMathCeil_f:* run duration:  5 000 ms

*float = FloatMath.ceil_f(f)*
            1 threads, Tavg =     85,42 ns/op (σ =   0,00 ns/op), Total ops
=     58534818 [    85,42 (58534818)]
            2 threads, Tavg =     85,56 ns/op (σ =   0,18 ns/op), Total ops
=    116880361 [    85,74 (58318655),     85,38 (58561706)]
            3 threads, Tavg =     85,49 ns/op (σ =   0,11 ns/op), Total ops
=    175469910 [    85,64 (58386401),     85,42 (58535723),     85,40
(58547786)]
*            4 threads, Tavg =     86,10 ns/op (σ =   0,86 ns/op), *Total
ops =    232739544 [    87,59 (57200792),     85,61 (58519538),     85,47
(58617544),     85,79 (58401670)]
#

#-------------------------------------------------------------
*# FloatMathCeil:* run duration:  5 000 ms
*int = FloatMath.ceil(f)*

            1 threads, Tavg =     56,72 ns/op (σ =   0,00 ns/op), Total ops
=     88153017 [    56,72 (88153017)]
            2 threads, Tavg =     56,90 ns/op (σ =   0,16 ns/op), Total ops
=    175737994 [    57,06 (87626873),     56,75 (88111121)]
            3 threads, Tavg =     56,82 ns/op (σ =   0,15 ns/op), Total ops
=    264003134 [    57,02 (87684429),     56,76 (88087214),     56,67
(88231491)]
*            4 threads, Tavg =     57,16 ns/op (σ =   0,57 ns/op),* Total
ops =    350060098 [    58,12 (86072473),     56,74 (88161260),     56,68
(88251450),     57,12 (87574915)]
#

#-------------------------------------------------------------
*# StrictMathFloor_f:* run duration:  5 000 ms
*float = (float) StrictMath.floor(f)*

            1 threads, Tavg =    108,69 ns/op (σ =   0,00 ns/op), Total ops
=     46005419 [   108,69 (46005419)]
            2 threads, Tavg =    108,87 ns/op (σ =   0,25 ns/op), Total ops
=     91856264 [   109,11 (45824174),    108,62 (46032090)]
            3 threads, Tavg =    108,66 ns/op (σ =   0,01 ns/op), Total ops
=    138046291 [   108,65 (46019660),    108,68 (46008068),    108,65
(46018563)]
*            4 threads, Tavg =    109,99 ns/op (σ =   1,00 ns/op),* Total
ops =    182162538 [   111,63 (44870853),    109,77 (45631259),    108,90
(45994047),    109,69 (45666379)]
#

#-------------------------------------------------------------
*# FloatMathFloor_f: *run duration:  5 000 ms
*float = FloatMath.floor_f(f)*

            1 threads, Tavg =     79,60 ns/op (σ =   0,00 ns/op), Total ops
=     62816917 [    79,60 (62816917)]
            2 threads, Tavg =     79,44 ns/op (σ =   0,15 ns/op), Total ops
=    125890579 [    79,58 (62827873),     79,29 (63062706)]
            3 threads, Tavg =     79,38 ns/op (σ =   0,15 ns/op), Total ops
=    188968096 [    79,59 (62823628),     79,23 (63107367),     79,32
(63037101)]
*            4 threads, Tavg =     79,88 ns/op (σ =   0,83 ns/op),* Total
ops =    250828233 [    81,31 (61604026),     79,60 (62930953),     79,32
(63149634),     79,33 (63143620)]
#

#-------------------------------------------------------------
*# FloatMathFloor:* run duration:  5 000 ms
*float = FloatMath.floor(f)*

            1 threads, Tavg =     70,20 ns/op (σ =   0,00 ns/op), Total ops
=     71226367 [    70,20 (71226367)]
            2 threads, Tavg =     70,35 ns/op (σ =   0,16 ns/op), Total ops
=    142141053 [    70,51 (70910131),     70,20 (71230922)]
            3 threads, Tavg =     70,26 ns/op (σ =   0,08 ns/op), Total ops
=    213504247 [    70,20 (71225449),     70,38 (71046834),     70,19
(71231964)]
*            4 threads, Tavg =     70,67 ns/op (σ =   0,60 ns/op), *Total
ops =    283376128 [    70,24 (71279973),     70,58 (70931050),     70,20
(71320272),     71,68 (69844833)]
#
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150703/e96792ff/attachment-0001.html>