[OpenJDK Rasterizer] RFR: Marlin renderer #2
Laurent Bourgès
bourges.laurent at gmail.com
Thu Jun 18 18:07:52 UTC 2015
Jim,
Thanks a lot for your efforts understanding my approach, I advocate it
looks tricky as I spent a lot of time optimizing every details during last
2 years.
As I reached ductus performance level with a pure java code, I agree other
improvements are more interesting than losing few percents.
I will now try being more opened minded to your valuable comments to
improve code maintainability without sacrifying too much the performance
level.
I will answer your previous answer tonight.
If you are now OK, can I push the patch including the 2 syntax fixes ? So I
need to create a new bug id (to be able to push the changeset) ...
Should I wait you merge gr repo with latest jdk changes ?
I will then focus on refactoring the array cache (and properly reset state)
+ optimize the FloatMath (ready).
As you wonder if initial arrays are still necessary, I will run tonight
some benchmarks with initial arrays = [0].
I think the 30% overhead was a long time ago and should be less (dirty
cache). However the cache code represents some overhead = get/grow/put +
more resizing (copy) than initial arrays....
More... later.
Best regards,
Laurent
Le 18 juin 2015 01:30, "Jim Graham" <james.graham at oracle.com> a écrit :
>
> OK, the new array stuff looks good to go. There is still those 2 minor
issues from before, but that's it:
>
> CollinearSimplifier.getNativeConsumer() => return 0
> FloatMath => parens around (~intpart) at 145.
>
> I don't need to see a webrev for those changes...
>
> ...jim
>
>
> On 6/17/15 1:43 PM, Jim Graham wrote:
>>
>> Hi Laurent,
>>
>> There were 1 or 2 very minor fixes I suggested in a couple of files that
>> you indicated you were going to do, but I didn't see a new webrev. Is
>> there one? I think the existing array stuff is probably OK and I'll
>> look at it again now that I understand the tradeoffs of the current
>> design of it, but I wanted to know if there was a newer webrev for me to
>> review based on those other minor fixes?
>>
>> On 6/12/15 5:45 AM, Laurent Bourgès wrote:
>>>
>>> A general thought - should we have foo_initial arrays if we are
>>> getting everything from a cache?
>>>
>>> I adopted this approach to maximize performance for the 99% cases: I
>>> sized properly the initial arrays and they takes ~ 512Kb per
>>> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
>>> bit slower). In march 2013, I started by using the cache for all array
>>> accesses and it was 30% slower !
>>
>>
>> Was the 30% back when we were allocating or clearing all of the arrays?
>> Now that we only clear what we use, the initial sizes of the arrays
>> shouldn't matter so much. It would be worth a little extra size
>> overhead to not have so many special cases and special handling for the
>> "_initial" arrays if the performance is now essentially identical.
>>
>>> I will look again at these cases: I agree "leaking refs" should be set
>>> to null to be sure. However, Marlin code is always executing the pattern
>>> init() / dispose() so leaking pointers are not critical but OK let's set
>>> all of them to null.
>>>
>>> For initial arrays, it seems redundant to set to null in dispose() and
>>> set to initial in init() = useless and counter-productive !
>>
>>
>> It gets easier to review and maintain code if there is a clear reset to
>> some initial conditions is all. I would like to try to keep the code as
>> simple and straightforward as possible while evaluating algorithmic
>> improvements without introducing potential bugs with a lot of special
>> case code and then we can worry about the 1%'s later.
>>
>> Also, we may find moving the code to native to be more of a performance
>> improvement than trying to fight with Hotspot's load/store policies and
>> so any code that tries to make "slightly more optimal byte codes" would
>> end up not being needed and would add some noise to any such native
>> conversion.
>>
>> But, I do agree that technically you are correct about some of these
>> suggestions leading to some provably redundant code. The issue is more
>> that constantly having to reprove the redundancies gets in the way of
>> some other progress we could be making.
>>
>> And, on the other hand, right now you are the engineer driving this from
>> the workload point of view (and I apologize if I am inadvertently in a
>> role that seems to be described as "the engineer getting in the way"),
>> so I'm willing to rely on your commitment to be diligent about these
>> optimizations. I'm just stating a case for whether they should be the
>> focus yet.
>>
>>> Sorry it is not a flaw but an advantage: the initial arrays that can
>>> grow in widenXXXArray() have an odd size as indicated by the comment:
>>> // +1 to avoid recycling in Helpers.widenArray()
>>> private final int[] edgePtrs_initial = new int[INITIAL_SMALL_ARRAY
>>> + 1]; // 4K
>>
>>
>> I saw that after the fact, but my main point remains that this is a
>> fragile inter-module optimization. I was unaware that it could save us
>> 30% to have special-cased "initial" arrays, and that fact makes this
>> relationship that much more important but I wonder if that is still
>> true. If we ditch the initial arrays because the new "dirty array"
>> management removes that performance penalty then this all goes away.
>> And, even if we keep the initial array concept it would be nice to find
>> a more reliable way to manage that than a side effect of its length.
>>
>>> I wanted to have a simple but robust API that can be easily used in
>>> Marlin.
>>
>>
>> I'd like to think that I'm working/promoting towards simple and robust
>> APIs as well.
>>
>>> I agree the documentation can be improved a lot and I could introduce a
>>> new helper method in ArrayCache to create the initial arrays in a safer
>>> and explicit way :
>>>
>>> int[] createNonCachedIntArray(int length) {
>>> // return only odd length to detect wrong cache usage
>>> if (length & 1 == 0) {
>>> length +=1;
>>> }
>>> return new int[length];
>>> }
>>
>>
>> In a future effort, if we find that we must keep some small initial
>> arrays for performance, we could make this a more explicit and tunable
>> parameter on the caches. Let's table that until we consider if they buy
>> us any performance after these dirty array changes.
>>
>>> Anyway I agree the cache handling can be improved a lot: try using
>>> generics for array types with a lambda factory to allocate them.
>>
>>
>> I just wish that generics could handle primitive arrays, but they can't.
>> I did a similar generification thing for the raster format conversion
>> code in FX, but I had to play games with the generic types being
>> ByteBuffer and IntBuffer and having the variants of those accept the
>> proper type of primitive arrays. It wasn't pretty, but it did work out
>> in the end.
>>
>>> The caching code is an example of "code with no obvious flaws
>>> because it is hard to figure out if it has flaws". We should strive
>>> for "code with obviously no flaws because it is easy to verify that
>>> it is doing everything correctly".
>>>
>>>
>>> I disagree the term "flaw" but I agree it is hard to figure out what is
>>> done.
>>
>>
>> I took that from a talk on producing more secure code. I think it
>> originally came from an overall system design that could easily be
>> proven secure. The first release of Java required all sensitive Java
>> code to be constantly asking "am I in the context that this operation is
>> allowed?" and the detection was fragile. It was replaced by code where
>> all sensitive operations were rejected unless the surrounding code
>> proved its protected context which was much more secure. "Disallowed
>> unless requested" was harder to make a mistake than "allowed unless it
>> looks questionable".
>>
>> But, that same philosophy serves a lot of code well. If you start
>> having a bunch of "agreements" between modules then they can quickly get
>> out of synch through regular maintenance and sometimes spiral out of
>> control and you spend more time verifying that you're still satisfying
>> all of these conventions rather than letting the code make sure things
>> are kosher because only valid operations are expressable.
>>
>>> OK, I can work later on a new framework:
>>> - ArrayCache<K> abstract class where <K> is byte[], int[], float[]
>>> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays
>>> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays
>>
>>
>> I think I can trust the existing architecture for at least a putback of
>> the last webrev that you sent out while we braingstorm a better caching
>> mechanism - let me make one last pass through it and I'll send an OK
>> later this afternoon.
>>
>>> What do you mean by "meking further real improvements in the
>>> algorithms" ?
>>
>>
>> Things like investigating DDA in the inner loops, alternate ways to
>> manage crossings that require less sorting, enhancing the stroker to do
>> non-uniform scaling (which would eliminate the double transform in some
>> cases), potentially passing the crossings to the output loop rather than
>> having to convert them into an alpha array, etc. A lot of that is
>> easier to review if there aren't a lot of extra "this line added 1%
>> performance" code adding visual noise.
>>
>>> Who could help ? I am the only one working concretely on this code, help
>>> wanted.
>>
>>
>> I'd love to get more people involved, but history tells us that this is
>> a fairly lonely field. :(
>>
>> I may have some time now that FX is in tightened bug-fix mode to do some
>> tinkering, but mainly I will probably be much quicker on getting reviews
>> done which may help as much as having another pair of hands in
practice...
>>
>> ...jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150618/ca505d22/attachment-0001.html>
More information about the graphics-rasterizer-dev
mailing list