[OpenJDK Rasterizer] RFR: Marlin renderer #2

Wed Jun 3 10:35:56 UTC 2015

Let me see if I understand up front what the caching needs are.

You have various places that use (often growable) int[], float[] or 
byte[] and you want to share arrays among them of various sizes.  Some 
of the arrays are used by code that can tolerate dirty data in the array 
it gets, others require the array to be clean before they can use it. 
Currently you avoid sharing arrays between code that can use dirty 
arrays and code that needs clean arrays?  Did I miss anything?

ArrayCache @104: you don't check if oversize or resizeDirtyInt are non-zero?

ArrayCache.get*Bucket() - you decided not to do the shift method?

ArrayCache.get*Bucket() - a for loop 1=>NUMBER requires the runtime to 
do bounds checking on the array access.  If you use 1=>array.length then 
it can see/prove that you cannot generate OOB indices and it can elide 
the checks.

ArrayCache.getNewSize() will return 0 if fed curSize=1.  That may not 
happen in practice, but it's a dangerous failure mode to leave around.

ArrayCache.BUCKET_GROW_N - I'm finding that these constants obscure what 
they are doing possibly because of their names.  I had to look and see 
how they are used to figure out what they "meant" and then when I was 
looking at how they are used I was thinking - you'd have to use a 1 here 
or else it won't work which defeated the purpose of having a constant 
since there is only one value that logically works in some of the 
places.  I'd find the code easier to read with hard-coded numbers in 
most of the places these constants are used.  There is no bug here, but 
it made the code harder to understand.

ByteArrayCache.check() - (I realize that you didn't modify this code, 
but it was in the context diffs and I noticed it) - the only way for 
empty to be set to false is in the test in the first loop.  Why not just 
embed the "if (!empty)" code into that conditional test in the loop and 
eliminate the empty variable?

CollinearSimplifier - why the switch from an enum to numeric constants? 
  Also, why not use a switch statement instead of a cascade of if tests?

CollinearSimplifier.closePath() - if you remember the last move 
coordinates then you can first emit lineTo(movex, movey) to give it a 
chance to make that closing segment colinear, then do the 
emit/delegate.close.

quadTo/curveTo - you can leave the state as PREV_POINT with the final 
point of the curve as px1,py1.  Otherwise any line following a curve is 
never simplified.

closePath - technically you could leave this as PREV_POINT with pxy1 
being the movexy.

getSlope() - comparing y2 == y1 and x2 > x1 both require subtracting one 
of them from the other.  It might make a slight improvement to start 
with "float ydiff;  if ((ydiff = y2 - y1) == 0)" since a compare to 0 
following a subtract should just be a test of the condition codes.  On 
the other hand, this might make the code harder to read and quite 
possibly the compiler would notice that you need the difference later 
and do this optimization for you.

Curve line 251 - indentation so that the "final float[]"s match up?

FloatArrayCache - see ByteArrayCache above

FloatMath.java, line 144 - does "((~intpart) >>> 31)"  (unsigned shift) 
work as well?

FloatMath - for the comments, "0" is considered neither negative nor 
positive, so you are technically not identifying what happens with 0 
since you only describe what happens for positive and negative numbers. 
  "Non-negative" means "0 and all positive numbers".

Helpers - there are a few "/2", "/3", and "/27" that didn't get 
converted to double constants?

Helpers - there are a number of /2.0f that you changed to /2f, but how 
about *0.5f instead since multiplication is often faster than division.

IntArrayCache - see ByteArrayCache above

MarlinRenderingengine, line 545,546 - here you changed a "*.5" into a 
"/2", but the multiply is usually faster than the divide.

MergeSort - has no changes in the Sdiffs?

Renderer, line 219 (and a few others later in the file) - *.25 may be 
faster than /4

Renderer, line 401 et al - assuming that the initial size of the array 
is larger than SIZEOF_EDGE_BYTES otherwise edgePtr<<1 may not be large 
enough to add SIZEOF_EDGE_BYTES.  Probably OK in practice for now, but 
could be a bug waiting to happen later if someone gets fancy with out 
that _edges array is initialized.

Renderer.dispose (and I've seen this in other places) - you clean the 
arrays after you've freed and replaced them with their initial values - 
it seems kind of odd.  Also, in most of these places the comment says 
"keep ... dirty", but you are about to clean them so it is the opposite 
of "keeping them dirty", so the comment seems to be backwards (I forget 
what other file I first noticed this, but it is a comment that has been 
cut/pasted in other places too).

Renderer.java, line 1260ish - the new way you set spMaxY has it 
potentially be 1 larger than it used to be...?

RendererContext - too many methods with the words "dirty" 
"byte/int/float", "array", and "cache" in them in different orders. 
They all blend together name-wise and I can't determine what each is 
supposed to do in order to verify that this all makes sense.  We need 
better nomenclature and/or packaging for these facilities, but I don't 
have any suggestions off the top of my head other than, for now, please 
review if the names of all of them follow some pattern, fix any that 
don't follow the pattern and then document the pattern in the code so 
that hopefully it will explain itself so I can review it in an informed 
manner.

Stroker - more *.5 converted to /2 (and higher powers of 2 as well)

Stroker - the commented-out somethingTo still uses hardcoded 
emit(...true/false) methods.

Stroker.curveTo() - (just observing) we go to all the trouble of storing 
known vars into the middle array, but then we access those known vars 
from the array in the subsequent code.  But, we know what values we 
stored there so wouldn't it be faster just to use the original values, 
rather than fetch them back out of an array?  The array in this case 
serves only to obscure what data we are computing with.  Although, 
arguably, we eventually pass the mid/middle array to other functions so 
we do still need those values stored in the array (unless those 
functions can be converted from array storage to direct values as 
well?).  This is just an observation, but not a problem.

Stroker line 1079,1155 - why move the declaration of curCurveOff outside 
the loop?

				...jim

On 4/29/2015 1:27 PM, Laurent Bourgès wrote:
> Jim,
>
> Here is a new webrev for the second step on the marlin renderer:
> http://cr.openjdk.java.net/~lbourges/marlin/marlin-s2.0/
>
> Changes:
> - ArrayCache: cleanup in the growth algorithm + fixed TODO
> - Float/Int ArrayCache: added putDirtyArray() methods
> - RendererContext: added dirtyInt/Float array cache and related methods
> - RendererStats: added statistics on cached array sizes
> - CollinearSimplifier: optimized condition evaluation order
> - FloatMath: removed once condition using bit masking to add +/- 1
>
> - Curve: fixed numeric constants + BreakPtrIterator deals with primitive
> integer (no more Interator<Integer>)
> - Dasher: fixed numeric constants + firstSegmentsBuffer uses the dirty
> float cache
> - Helpers: fixed numeric constants + removed widenArray methods (use
> directly RendererContext instead)
> - MarlinCache: added stats for rowAAChunk + fixed doc
> - MarlinRenderingEngine: fixed numeric constants + newDashes uses the
> dirty float cache + RendererContext uses now Weak reference by default
> (instead of Soft)
> - Renderer:
>      - keep used range for edgeBuckets / edgeBucketCounts in
> endRendering() used then in dispose() to avoid FloatMath.ceil() calls
>      - crossings / aux_crossings & edgePtrs / aux_edgePtrs use dirty int
> array caches
> - Stroker: fixed numeric constants + use explicit emitLineToRev() /
> emitQuadToRev() / emitCurveToRev() as short cuts + use local variables
> for readability and minor performance gain
> - Stroker.PolyStack: curveTypes / curves use the dirty byte / float
> array caches + optimized popAll() loop
>
> Cheers,
> Laurent