[OpenJDK Rasterizer] Marlin renderer contribution for review

Wed Mar 18 21:50:03 UTC 2015

Hi Laurent,

Keep in mind that the "things to consider later" parts of the review 
shouldn't hold up the process, but if you see something you want to fix 
now, feel free to address them sooner or later.

On 3/18/15 9:17 AM, Laurent Bourgès wrote:
>     -------- Here are some "things to consider later" ---------
>
>     I just realized the main case where we want to deal with cleaning
>     out dirty arrays and it would be the crossings accumulation arrays.
>     It seems to me that those are special cases and so we should
>     concentrate the cleaning code on those explicitly and not have a
>     global "lets clean all arrays everywhere" setting.
>
> Agreed.
> Let's clarify my approach in marlin: it uses mainly reused arrays (int,
> byte, float) => no allocation (GC overhead).
> To achieve the best performance, it only clears dirty parts ie used ones
> (see the many used mark fields).
>
> Dirty arrays are named "dirty" to indicate that these arrays are used in
> a "unsafe" manner ie never zero-filled (always junk). It means the code
> must be good enough to always set values and get those that have be set.

Perhaps a naming convention for the variable so that we can spot code 
that might rely on cleared data being used on an array that isn't 
necessarily cleared then?

> I think these cache should use WeakReferences to avoid wasting too much
> memory (more volatile), but it remains to do.

That sounds like a good refinement.  We could also just make the 
ThreadLocal store a ref as well.  Is there such a thing as a 
WeakThreadLocal?

>     Isn't declaring flatLeafCoefficients as final enough to not need to
>     cache it's reference in a local variable?  Also, I would think you
>     would gain more by simply deleting the array and storing the
>     coefficients in fields instead?  An array index lookup should be no
>     simpler than a local field access.
>
> In Dasher ? maybe it is minor optimisation but I prefer using local
> variables instead of field access for performance reason.

I don't think you understood what I was getting at.  I realize the use 
of a local variable avoids field fetches.  I was wondering if it matters 
for this particular case since the field is final which means the 
compiler should be able to assume that the value doesn't change and load 
it into a local variable on its own (in the compiled code).

In addition to that concern I point out that indexing into the fLC array 
to get the 4 values is identical to a field fetch anyway.  So, we used 
to have potentially double-fetches to access the values (fetch the 
array, fetch the value from the array).  You made it final which could 
potentially reduce the first fetch of the array field to just one fetch 
for all of the values in the compiled code.  You also copied the value 
into a local variable which forced the issue, but we still have that 
extra field reference to load the array reference into a local variable.

My second point there was that if we just stored 4 values instead of an 
array of 4 values, then we'd reduce this all down to just a field 
reference for each of the 4 values, and that is roughly equivalent to an 
array fetch anyway.  So, we went from:

get array, get item 0
get array, get item 1
get array, get item 2
get array, get item 3

to:

get array, get item 0
            get item 1
            get item 2
            get item 3

but why not eliminate the array altogether and end up with:

get item 0
get item 1
get item 2
get item 3

(and making that change would probably recommend running more benchmarks 
which means we should hold off on that until we get a stable stake in 
the ground in the gr-dev repo...)

Also note that an array of 4 floats costs 4x32 bits for the floats, plus 
array object overhead, plus 32 or 64 bits to store the reference to the 
array.  For cases where we pass such an array around a lot it might be 
OK to have that overhead to reduce method parameter passing, or for 
cases where we want to iterate over the values one by one it may also 
make sense.  But in this case we are storing 4 related, but separate, 
coefficients which means we are just using it as a bundle of 4 values.

>     FastMath - In general, if we use "(int) ceil()" a lot, we might want
>     to consider offsetting the math so that all values are negative and
>     then use a cast to (int).  In particular, if crossings are going to
>     use "ceil(v - 0.5)" which is the proper way to do half-open
>     intervals with centers included on top and left sides, then simply
>     shifting everything once so that the entire range of interesting X,Y
>     are negative would mean that all crossings calculations could be
>     done by casting to integer, especially if we also offset them all by
>     0.5 as well.
>
> Only normalization does the ceil(v - 0.5) but the crossings are directly
> casted to int:
>
> +                        // update crossing ( x-coordinate + last bit = orientation):
> +                        cross = (((int) f_curx) << 1) | _unsafe.getInt(addr + _OFF_YMAX_OR) & 0x1; /* last bit contains orientation (0 or 1) */

Note that cast is theoretically in error.

> I noted several times to improve that rounding issue with subpixel centers !
> But it is less important than improving quality for nearly vertical or
> horizontal lines (large subpixel steps): I have ideas related to span
> interpolation or using adaptive scanline steps... (in future)

Understood.  BTW, when we adapted Pisces for FX we had to add a 
ceil(curx - 0.5) to the crossing calculation here because we instantiate 
the Pisces renderer optionally with 1x1 sampling for non-AA rendering 
and the renderings were all off by a pixel if we didn't make that 
change.  It hasn't been noticed when this renderer is used only in AA 
mode in the JDK (or when we use it optionally for AA in FX) because it 
would only be off by 1/64th of a coverage value.

> I use FastMath mainly to handle boundaries and coordinates can be either
> positive or negative ... so I do not have the choice to deal with
> negative values !

That's probably true for normalization then.  For the rendering part, we 
could carefully constrain our values so that they are always positive or 
negative since we have a target clip to render relative to.

>     FastMath - if we can prove that no values <0 are interesting (as in,
>     the only interesting property about them is that they are less than
>     0), then we can just use a cast to integer.  This is particular true
>     if we clamp a value to >= 0 after we convert it to an integer - in
>     that case we might as well have used an int cast rather than a
>     floor() computation.
>
> Ok let's discuss that later.

As is true for all of these issues "after the line"... ;)

			...jim