[OpenJDK Rasterizer] RFR: Marlin renderer #2
Jim Graham
james.graham at oracle.com
Wed Jun 17 20:43:07 UTC 2015
Hi Laurent,
There were 1 or 2 very minor fixes I suggested in a couple of files that
you indicated you were going to do, but I didn't see a new webrev. Is
there one? I think the existing array stuff is probably OK and I'll
look at it again now that I understand the tradeoffs of the current
design of it, but I wanted to know if there was a newer webrev for me to
review based on those other minor fixes?
On 6/12/15 5:45 AM, Laurent Bourgès wrote:
> A general thought - should we have foo_initial arrays if we are
> getting everything from a cache?
>
> I adopted this approach to maximize performance for the 99% cases: I
> sized properly the initial arrays and they takes ~ 512Kb per
> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
> bit slower). In march 2013, I started by using the cache for all array
> accesses and it was 30% slower !
Was the 30% back when we were allocating or clearing all of the arrays?
Now that we only clear what we use, the initial sizes of the arrays
shouldn't matter so much. It would be worth a little extra size
overhead to not have so many special cases and special handling for the
"_initial" arrays if the performance is now essentially identical.
> I will look again at these cases: I agree "leaking refs" should be set
> to null to be sure. However, Marlin code is always executing the pattern
> init() / dispose() so leaking pointers are not critical but OK let's set
> all of them to null.
>
> For initial arrays, it seems redundant to set to null in dispose() and
> set to initial in init() = useless and counter-productive !
It gets easier to review and maintain code if there is a clear reset to
some initial conditions is all. I would like to try to keep the code as
simple and straightforward as possible while evaluating algorithmic
improvements without introducing potential bugs with a lot of special
case code and then we can worry about the 1%'s later.
Also, we may find moving the code to native to be more of a performance
improvement than trying to fight with Hotspot's load/store policies and
so any code that tries to make "slightly more optimal byte codes" would
end up not being needed and would add some noise to any such native
conversion.
But, I do agree that technically you are correct about some of these
suggestions leading to some provably redundant code. The issue is more
that constantly having to reprove the redundancies gets in the way of
some other progress we could be making.
And, on the other hand, right now you are the engineer driving this from
the workload point of view (and I apologize if I am inadvertently in a
role that seems to be described as "the engineer getting in the way"),
so I'm willing to rely on your commitment to be diligent about these
optimizations. I'm just stating a case for whether they should be the
focus yet.
> Sorry it is not a flaw but an advantage: the initial arrays that can
> grow in widenXXXArray() have an odd size as indicated by the comment:
> // +1 to avoid recycling in Helpers.widenArray()
> private final int[] edgePtrs_initial = new int[INITIAL_SMALL_ARRAY
> + 1]; // 4K
I saw that after the fact, but my main point remains that this is a
fragile inter-module optimization. I was unaware that it could save us
30% to have special-cased "initial" arrays, and that fact makes this
relationship that much more important but I wonder if that is still
true. If we ditch the initial arrays because the new "dirty array"
management removes that performance penalty then this all goes away.
And, even if we keep the initial array concept it would be nice to find
a more reliable way to manage that than a side effect of its length.
> I wanted to have a simple but robust API that can be easily used in Marlin.
I'd like to think that I'm working/promoting towards simple and robust
APIs as well.
> I agree the documentation can be improved a lot and I could introduce a
> new helper method in ArrayCache to create the initial arrays in a safer
> and explicit way :
>
> int[] createNonCachedIntArray(int length) {
> // return only odd length to detect wrong cache usage
> if (length & 1 == 0) {
> length +=1;
> }
> return new int[length];
> }
In a future effort, if we find that we must keep some small initial
arrays for performance, we could make this a more explicit and tunable
parameter on the caches. Let's table that until we consider if they buy
us any performance after these dirty array changes.
> Anyway I agree the cache handling can be improved a lot: try using
> generics for array types with a lambda factory to allocate them.
I just wish that generics could handle primitive arrays, but they can't.
I did a similar generification thing for the raster format conversion
code in FX, but I had to play games with the generic types being
ByteBuffer and IntBuffer and having the variants of those accept the
proper type of primitive arrays. It wasn't pretty, but it did work out
in the end.
> The caching code is an example of "code with no obvious flaws
> because it is hard to figure out if it has flaws". We should strive
> for "code with obviously no flaws because it is easy to verify that
> it is doing everything correctly".
>
>
> I disagree the term "flaw" but I agree it is hard to figure out what is
> done.
I took that from a talk on producing more secure code. I think it
originally came from an overall system design that could easily be
proven secure. The first release of Java required all sensitive Java
code to be constantly asking "am I in the context that this operation is
allowed?" and the detection was fragile. It was replaced by code where
all sensitive operations were rejected unless the surrounding code
proved its protected context which was much more secure. "Disallowed
unless requested" was harder to make a mistake than "allowed unless it
looks questionable".
But, that same philosophy serves a lot of code well. If you start
having a bunch of "agreements" between modules then they can quickly get
out of synch through regular maintenance and sometimes spiral out of
control and you spend more time verifying that you're still satisfying
all of these conventions rather than letting the code make sure things
are kosher because only valid operations are expressable.
> OK, I can work later on a new framework:
> - ArrayCache<K> abstract class where <K> is byte[], int[], float[]
> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays
> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays
I think I can trust the existing architecture for at least a putback of
the last webrev that you sent out while we braingstorm a better caching
mechanism - let me make one last pass through it and I'll send an OK
later this afternoon.
> What do you mean by "meking further real improvements in the algorithms" ?
Things like investigating DDA in the inner loops, alternate ways to
manage crossings that require less sorting, enhancing the stroker to do
non-uniform scaling (which would eliminate the double transform in some
cases), potentially passing the crossings to the output loop rather than
having to convert them into an alpha array, etc. A lot of that is
easier to review if there aren't a lot of extra "this line added 1%
performance" code adding visual noise.
> Who could help ? I am the only one working concretely on this code, help
> wanted.
I'd love to get more people involved, but history tells us that this is
a fairly lonely field. :(
I may have some time now that FX is in tightened bug-fix mode to do some
tinkering, but mainly I will probably be much quicker on getting reviews
done which may help as much as having another pair of hands in practice...
...jim
More information about the graphics-rasterizer-dev
mailing list