[OpenJDK Rasterizer] RFR: Marlin renderer #2

Sat Jun 6 01:11:39 UTC 2015

On 6/5/15 2:18 PM, Laurent Bourgès wrote:
>> ArrayCache.get*Bucket() - you decided not to do the shift method?
>
> I gave up: Do you think it is worth as there is only 4 buckets (ie few
> if) ? Do you have an implementation to propose ?

Once I realized it was exponential growth I realized why you gave up. 
No easy suggestions there.

>> ArrayCache.getNewSize() will return 0 if fed curSize=1.  That may not happen in practice, but it's a dangerous failure mode to leave around.
>
> It is a bit tricky but 0 is not dangerous here as 0 < MIN_ARRAY_SIZE (=
> 4096): it means that the first bucket will be used to get the new array
> => int[4096]
>
> final int[] res = getIntArray(getNewSize(usedSize)); => return
> getIntArrayCache(0).getArray(); // first bucket used

Yeah, the input of 0 would be difficult to see in practice, but the fact 
that one needs to examine who calls this method and how bothers me in 
general.  I prefer defensive programming for things like this that are 
executed once per array growth.  But, that's just a style point.

>> CollinearSimplifier - why the switch from an enum to numeric constants?  Also, why not use a switch statement instead of a cascade of if tests?
>
> Fixed: I now use a integer-based switch.

Looks good.  I'm still curious why you didn't use an enum - is the code 
more efficient just dealing with basic numbers rather than enum instances?

>> CollinearSimplifier.closePath() - if you remember the last move coordinates then you can first emit lineTo(movex, movey) to give it a chance to make that closing segment colinear, then do the emit/delegate.close.
>
>> closePath - technically you could leave this as PREV_POINT with pxy1 being the movexy.
>
> Postponed as it seems a side-corner case.

Makes sense.

>> FloatMath.java, line 144 - does "((~intpart) >>> 31)"  (unsigned shift) work as well?
>
> Fixed.

It doesn't look like my suggestion.  Note that I was doing the 
complement before the unsigned shift.  That leaves you with 1 for 
positive&0 and 0 for negative.  You don't need the "& 1" after it if you 
do the complement before the shift.

>> FloatMath - for the comments, "0" is considered neither negative nor positive, so you are technically not identifying what happens with 0 since you only describe what happens for positive and negative numbers.  "Non-negative" means "0 and all positive numbers".
>
> Fixed comments.

I don't see it fixed.  You added the word "numbers", but you still don't 
include 0 in any of your cases since 0 is neither positive nor negative. 
  I'm referring to lines 142,143 & 197,198.  Replace the word "positive" 
with "non-negative" in those comments and you will cover the 0 case.  To 
be clear:

negative number = any number that is less than 0 (and not 0)
positive number = any number that is greater than 0 (and not 0)
non-negative number = any number that is greater than or equal to 0

The sign bit is 0 for all positive numbers and also for 0 - i.e. all 
non-negative nubmers

>> Helpers - there are a number of /2.0f that you changed to /2f, but how about *0.5f instead since multiplication is often faster than division.
>
> I tried to replace such division cases by multiplication but I did not
> get any performance gain: certainly hotspot is doing a great job in
> optimizing maths => I replaced all "tricky" multiplications by power of
> 2 divisions to be consistent in Marlin, more mathematically correct and
> explicit.
>
> Maybe / 2f or / 4f can be done by shifiting the exponent part in the
> assembler code (see scalb) ?

OK, this may be an old performance issue that is no longer reproducible 
on modern chips and compilers.

Out of curiosity, which platforms did you test this on?

>> MergeSort - has no changes in the Sdiffs?
>
> There was an indentation issue at line 120 => but the webrev diff
> ignores it !

It's showing up this time.  Probably that -b option you provided or 
something.  I see it now.

>> Renderer, line 401 et al - assuming that the initial size of the array is larger than SIZEOF_EDGE_BYTES otherwise edgePtr<<1 may not be large enough to add SIZEOF_EDGE_BYTES.  Probably OK in practice for now, but could be a bug waiting to happen later if someone gets fancy with out that _edges array is initialized.
>
> I agree it looks tricky as it supposes _edges.length is always >
> SIZEOF_EDGE_BYTES. I could add a comment ?

Or an assert?  Either or.

>> Renderer.java, line 1260ish - the new way you set spMaxY has it potentially be 1 larger than it used to be...?
>
> No, I checked again: it is written differently but equivalent. Probably
> I misnamed the local variable _boundsMaxY = boundsMaxY - 1; I renamed it
> to _boundsMaxYm1 to be more clear.

Ah, I see it now.  On the other hand, now you end up with the clunky 
"Yminus1 + 1" later on.  Since this code is executed once per rendering 
cycle, I don't see any performance concerns here to require loading the 
fields into local variables.  Just using the raw fields for the 2 simple 
calculations may be clearer on the other hand.

>> RendererContext - too many methods with the words "dirty" "byte/int/float", "array", and "cache" in them in different orders. They all blend together name-wise and I can't determine what each is supposed to do in order to verify that this all makes sense.  We need better nomenclature and/or packaging for these facilities, but I don't have any suggestions off the top of my head other than, for now, please review if the names of all of them follow some pattern, fix any that don't follow the pattern and then document the pattern in the code so that hopefully it will explain itself so I can review it in an informed manner.
>
> I renamed few methods to have a consistent naming convention: see intro.

OK, let me make another pass through then now that we have something 
consistent.  I wanted to get these comments out to you on what I've seen 
so far sooner rather than later, though.

>> Stroker.curveTo() - (just observing) we go to all the trouble of storing known vars into the middle array, but then we access those known vars from the array in the subsequent code.  But, we know what values we stored there so wouldn't it be faster just to use the original values, rather than fetch them back out of an array?  The array in this case serves only to obscure what data we are computing with.  Although, arguably, we eventually pass the mid/middle array to other functions so we do still need those values stored in the array (unless those functions can be converted from array storage to direct values as well?).  This is just an observation, but not a problem.
>
> I agree that this part of the code is obscure to me: I was wondering
> what is the objective ... But the the mid/middle array is used by
> findSubdivPoints(mid), breakPtsAtTs(mid) and computeOffsetCubic(mid) ...
> so I did not change much the working code !
>
> If you think we could have some performance gain, I may try rewriting
> that part ... but I think eliminating collinear segments first is more
> efficient.

This is probably something to look at later after these changes go in. 
For a few minutes of changing a couple of method signatures it might 
result in some cleaner code, but not something to rework just now.

>> Stroker line 1079,1155 - why move the declaration of curCurveOff outside the loop?
>
> Reverted (even if I prefer declarations outside loops).

I guess I prefer to limit the scope of variables whenever possible.

> PS: If you prefer, I can replace all divisions by multiplications but it
> should be consistent accross the code even if it has no impact in  my
> benchmarks !

No, that was an optimization I've seen in the past, but if you have 
evidence to the contrary we should be fine.

> Thanks again for your review and your time,

And thanks for all your hard work on this code!

			...jim