[OpenJDK Rasterizer] RFR: Marlin renderer #2

Wed Jun 17 20:43:07 UTC 2015

Hi Laurent,

There were 1 or 2 very minor fixes I suggested in a couple of files that 
you indicated you were going to do, but I didn't see a new webrev.  Is 
there one?  I think the existing array stuff is probably OK and I'll 
look at it again now that I understand the tradeoffs of the current 
design of it, but I wanted to know if there was a newer webrev for me to 
review based on those other minor fixes?

On 6/12/15 5:45 AM, Laurent Bourgès wrote:
>     A general thought - should we have foo_initial arrays if we are
>     getting everything from a cache?
>
> I adopted this approach to maximize performance for the 99% cases: I
> sized properly the initial arrays and they takes ~ 512Kb per
> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
> bit slower). In march 2013, I started by using the cache for all array
> accesses and it was 30% slower !

Was the 30% back when we were allocating or clearing all of the arrays? 
  Now that we only clear what we use, the initial sizes of the arrays 
shouldn't matter so much.  It would be worth a little extra size 
overhead to not have so many special cases and special handling for the 
"_initial" arrays if the performance is now essentially identical.

> I will look again at these cases: I agree "leaking refs" should be set
> to null to be sure. However, Marlin code is always executing the pattern
> init() / dispose() so leaking pointers are not critical but OK let's set
> all of them to null.
>
> For initial arrays, it seems redundant to set to null in dispose() and
> set to initial in init() = useless and counter-productive !

It gets easier to review and maintain code if there is a clear reset to 
some initial conditions is all.  I would like to try to keep the code as 
simple and straightforward as possible while evaluating algorithmic 
improvements without introducing potential bugs with a lot of special 
case code and then we can worry about the 1%'s later.

Also, we may find moving the code to native to be more of a performance 
improvement than trying to fight with Hotspot's load/store policies and 
so any code that tries to make "slightly more optimal byte codes" would 
end up not being needed and would add some noise to any such native 
conversion.

But, I do agree that technically you are correct about some of these 
suggestions leading to some provably redundant code.  The issue is more 
that constantly having to reprove the redundancies gets in the way of 
some other progress we could be making.

And, on the other hand, right now you are the engineer driving this from 
the workload point of view (and I apologize if I am inadvertently in a 
role that seems to be described as "the engineer getting in the way"), 
so I'm willing to rely on your commitment to be diligent about these 
optimizations.  I'm just stating a case for whether they should be the 
focus yet.

> Sorry it is not a flaw but an advantage: the initial arrays that can
> grow in widenXXXArray() have an odd size as indicated by the comment:
>      // +1 to avoid recycling in Helpers.widenArray()
>      private final int[] edgePtrs_initial  = new int[INITIAL_SMALL_ARRAY
> + 1]; // 4K

I saw that after the fact, but my main point remains that this is a 
fragile inter-module optimization.  I was unaware that it could save us 
30% to have special-cased "initial" arrays, and that fact makes this 
relationship that much more important but I wonder if that is still 
true.  If we ditch the initial arrays because the new "dirty array" 
management removes that performance penalty then this all goes away. 
And, even if we keep the initial array concept it would be nice to find 
a more reliable way to manage that than a side effect of its length.

> I wanted to have a simple but robust API that can be easily used in Marlin.

I'd like to think that I'm working/promoting towards simple and robust 
APIs as well.

> I agree the documentation can be improved a lot and I could introduce a
> new helper method in ArrayCache to create the initial arrays in a safer
> and explicit way :
>
> int[] createNonCachedIntArray(int length) {
> // return only odd length to detect wrong cache usage
>      if (length & 1 == 0) {
>          length +=1;
>      }
>      return new int[length];
> }

In a future effort, if we find that we must keep some small initial 
arrays for performance, we could make this a more explicit and tunable 
parameter on the caches.  Let's table that until we consider if they buy 
us any performance after these dirty array changes.

> Anyway I agree the cache handling can be improved a lot: try using
> generics for array types with a lambda factory to allocate them.

I just wish that generics could handle primitive arrays, but they can't. 
  I did a similar generification thing for the raster format conversion 
code in FX, but I had to play games with the generic types being 
ByteBuffer and IntBuffer and having the variants of those accept the 
proper type of primitive arrays.  It wasn't pretty, but it did work out 
in the end.

>     The caching code is an example of "code with no obvious flaws
>     because it is hard to figure out if it has flaws".  We should strive
>     for "code with obviously no flaws because it is easy to verify that
>     it is doing everything correctly".
>
>
> I disagree the term "flaw" but I agree it is hard to figure out what is
> done.

I took that from a talk on producing more secure code.  I think it 
originally came from an overall system design that could easily be 
proven secure.  The first release of Java required all sensitive Java 
code to be constantly asking "am I in the context that this operation is 
allowed?" and the detection was fragile.  It was replaced by code where 
all sensitive operations were rejected unless the surrounding code 
proved its protected context which was much more secure.  "Disallowed 
unless requested" was harder to make a mistake than "allowed unless it 
looks questionable".

But, that same philosophy serves a lot of code well.  If you start 
having a bunch of "agreements" between modules then they can quickly get 
out of synch through regular maintenance and sometimes spiral out of 
control and you spend more time verifying that you're still satisfying 
all of these conventions rather than letting the code make sure things 
are kosher because only valid operations are expressable.

> OK, I can work later on a new framework:
> - ArrayCache<K> abstract class where <K> is byte[], int[], float[]
> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays
> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays

I think I can trust the existing architecture for at least a putback of 
the last webrev that you sent out while we braingstorm a better caching 
mechanism - let me make one last pass through it and I'll send an OK 
later this afternoon.

> What do you mean by "meking further real improvements in the algorithms" ?

Things like investigating DDA in the inner loops, alternate ways to 
manage crossings that require less sorting, enhancing the stroker to do 
non-uniform scaling (which would eliminate the double transform in some 
cases), potentially passing the crossings to the output loop rather than 
having to convert them into an alpha array, etc.  A lot of that is 
easier to review if there aren't a lot of extra "this line added 1% 
performance" code adding visual noise.

> Who could help ? I am the only one working concretely on this code, help
> wanted.

I'd love to get more people involved, but history tells us that this is 
a fairly lonely field.  :(

I may have some time now that FX is in tightened bug-fix mode to do some 
tinkering, but mainly I will probably be much quicker on getting reviews 
done which may help as much as having another pair of hands in practice...

			...jim