[OpenJDK Rasterizer] RFR: Marlin renderer #2

Thu Jun 18 18:07:52 UTC 2015

Jim,

Thanks a lot for your efforts understanding my approach, I advocate it
looks tricky as I spent a lot of time optimizing every details during last
2 years.
As I reached ductus performance level with a pure java code, I agree other
improvements are more interesting than losing few percents.

I will now try being more opened minded to your valuable comments to
improve code maintainability without sacrifying too much the performance
level.

I will answer your previous answer tonight.

If you are now OK, can I push the patch including the 2 syntax fixes ? So I
need to create a new bug id (to be able to push the changeset)  ...

Should I wait you merge gr repo with latest jdk changes ?

I will then focus on refactoring the array cache (and properly reset state)
+ optimize the FloatMath (ready).

As you wonder  if initial arrays are still necessary, I will run tonight
some benchmarks with initial arrays = [0].

I think the 30% overhead was a long time ago and should be less (dirty
cache). However the cache code represents some overhead = get/grow/put +
more resizing (copy) than initial arrays....

More... later.

Best regards,
Laurent

Le 18 juin 2015 01:30, "Jim Graham" <james.graham at oracle.com> a écrit :
>
> OK, the new array stuff looks good to go.  There is still those 2 minor
issues from before, but that's it:
>
> CollinearSimplifier.getNativeConsumer() => return 0
> FloatMath => parens around (~intpart) at 145.
>
> I don't need to see a webrev for those changes...
>
>                         ...jim
>
>
> On 6/17/15 1:43 PM, Jim Graham wrote:
>>
>> Hi Laurent,
>>
>> There were 1 or 2 very minor fixes I suggested in a couple of files that
>> you indicated you were going to do, but I didn't see a new webrev.  Is
>> there one?  I think the existing array stuff is probably OK and I'll
>> look at it again now that I understand the tradeoffs of the current
>> design of it, but I wanted to know if there was a newer webrev for me to
>> review based on those other minor fixes?
>>
>> On 6/12/15 5:45 AM, Laurent Bourgès wrote:
>>>
>>>     A general thought - should we have foo_initial arrays if we are
>>>     getting everything from a cache?
>>>
>>> I adopted this approach to maximize performance for the 99% cases: I
>>> sized properly the initial arrays and they takes ~ 512Kb per
>>> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
>>> bit slower). In march 2013, I started by using the cache for all array
>>> accesses and it was 30% slower !
>>
>>
>> Was the 30% back when we were allocating or clearing all of the arrays?
>>   Now that we only clear what we use, the initial sizes of the arrays
>> shouldn't matter so much.  It would be worth a little extra size
>> overhead to not have so many special cases and special handling for the
>> "_initial" arrays if the performance is now essentially identical.
>>
>>> I will look again at these cases: I agree "leaking refs" should be set
>>> to null to be sure. However, Marlin code is always executing the pattern
>>> init() / dispose() so leaking pointers are not critical but OK let's set
>>> all of them to null.
>>>
>>> For initial arrays, it seems redundant to set to null in dispose() and
>>> set to initial in init() = useless and counter-productive !
>>
>>
>> It gets easier to review and maintain code if there is a clear reset to
>> some initial conditions is all.  I would like to try to keep the code as
>> simple and straightforward as possible while evaluating algorithmic
>> improvements without introducing potential bugs with a lot of special
>> case code and then we can worry about the 1%'s later.
>>
>> Also, we may find moving the code to native to be more of a performance
>> improvement than trying to fight with Hotspot's load/store policies and
>> so any code that tries to make "slightly more optimal byte codes" would
>> end up not being needed and would add some noise to any such native
>> conversion.
>>
>> But, I do agree that technically you are correct about some of these
>> suggestions leading to some provably redundant code.  The issue is more
>> that constantly having to reprove the redundancies gets in the way of
>> some other progress we could be making.
>>
>> And, on the other hand, right now you are the engineer driving this from
>> the workload point of view (and I apologize if I am inadvertently in a
>> role that seems to be described as "the engineer getting in the way"),
>> so I'm willing to rely on your commitment to be diligent about these
>> optimizations.  I'm just stating a case for whether they should be the
>> focus yet.
>>
>>> Sorry it is not a flaw but an advantage: the initial arrays that can
>>> grow in widenXXXArray() have an odd size as indicated by the comment:
>>>      // +1 to avoid recycling in Helpers.widenArray()
>>>      private final int[] edgePtrs_initial  = new int[INITIAL_SMALL_ARRAY
>>> + 1]; // 4K
>>
>>
>> I saw that after the fact, but my main point remains that this is a
>> fragile inter-module optimization.  I was unaware that it could save us
>> 30% to have special-cased "initial" arrays, and that fact makes this
>> relationship that much more important but I wonder if that is still
>> true.  If we ditch the initial arrays because the new "dirty array"
>> management removes that performance penalty then this all goes away.
>> And, even if we keep the initial array concept it would be nice to find
>> a more reliable way to manage that than a side effect of its length.
>>
>>> I wanted to have a simple but robust API that can be easily used in
>>> Marlin.
>>
>>
>> I'd like to think that I'm working/promoting towards simple and robust
>> APIs as well.
>>
>>> I agree the documentation can be improved a lot and I could introduce a
>>> new helper method in ArrayCache to create the initial arrays in a safer
>>> and explicit way :
>>>
>>> int[] createNonCachedIntArray(int length) {
>>> // return only odd length to detect wrong cache usage
>>>      if (length & 1 == 0) {
>>>          length +=1;
>>>      }
>>>      return new int[length];
>>> }
>>
>>
>> In a future effort, if we find that we must keep some small initial
>> arrays for performance, we could make this a more explicit and tunable
>> parameter on the caches.  Let's table that until we consider if they buy
>> us any performance after these dirty array changes.
>>
>>> Anyway I agree the cache handling can be improved a lot: try using
>>> generics for array types with a lambda factory to allocate them.
>>
>>
>> I just wish that generics could handle primitive arrays, but they can't.
>>   I did a similar generification thing for the raster format conversion
>> code in FX, but I had to play games with the generic types being
>> ByteBuffer and IntBuffer and having the variants of those accept the
>> proper type of primitive arrays.  It wasn't pretty, but it did work out
>> in the end.
>>
>>>     The caching code is an example of "code with no obvious flaws
>>>     because it is hard to figure out if it has flaws".  We should strive
>>>     for "code with obviously no flaws because it is easy to verify that
>>>     it is doing everything correctly".
>>>
>>>
>>> I disagree the term "flaw" but I agree it is hard to figure out what is
>>> done.
>>
>>
>> I took that from a talk on producing more secure code.  I think it
>> originally came from an overall system design that could easily be
>> proven secure.  The first release of Java required all sensitive Java
>> code to be constantly asking "am I in the context that this operation is
>> allowed?" and the detection was fragile.  It was replaced by code where
>> all sensitive operations were rejected unless the surrounding code
>> proved its protected context which was much more secure.  "Disallowed
>> unless requested" was harder to make a mistake than "allowed unless it
>> looks questionable".
>>
>> But, that same philosophy serves a lot of code well.  If you start
>> having a bunch of "agreements" between modules then they can quickly get
>> out of synch through regular maintenance and sometimes spiral out of
>> control and you spend more time verifying that you're still satisfying
>> all of these conventions rather than letting the code make sure things
>> are kosher because only valid operations are expressable.
>>
>>> OK, I can work later on a new framework:
>>> - ArrayCache<K> abstract class where <K> is byte[], int[], float[]
>>> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays
>>> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays
>>
>>
>> I think I can trust the existing architecture for at least a putback of
>> the last webrev that you sent out while we braingstorm a better caching
>> mechanism - let me make one last pass through it and I'll send an OK
>> later this afternoon.
>>
>>> What do you mean by "meking further real improvements in the
>>> algorithms" ?
>>
>>
>> Things like investigating DDA in the inner loops, alternate ways to
>> manage crossings that require less sorting, enhancing the stroker to do
>> non-uniform scaling (which would eliminate the double transform in some
>> cases), potentially passing the crossings to the output loop rather than
>> having to convert them into an alpha array, etc.  A lot of that is
>> easier to review if there aren't a lot of extra "this line added 1%
>> performance" code adding visual noise.
>>
>>> Who could help ? I am the only one working concretely on this code, help
>>> wanted.
>>
>>
>> I'd love to get more people involved, but history tells us that this is
>> a fairly lonely field.  :(
>>
>> I may have some time now that FX is in tightened bug-fix mode to do some
>> tinkering, but mainly I will probably be much quicker on getting reviews
>> done which may help as much as having another pair of hands in
practice...
>>
>>              ...jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150618/ca505d22/attachment-0001.html>