[OpenJDK Rasterizer] RFR: Marlin renderer #2
Laurent Bourgès
bourges.laurent at gmail.com
Thu Jun 18 21:46:17 UTC 2015
Hi Jim,
Here is my detailled answer:
2015-06-17 22:43 GMT+02:00 Jim Graham <james.graham at oracle.com>:
> Hi Laurent,
>
> There were 1 or 2 very minor fixes I suggested in a couple of files that
> you indicated you were going to do, but I didn't see a new webrev. Is
> there one? I think the existing array stuff is probably OK and I'll look
> at it again now that I understand the tradeoffs of the current design of
> it, but I wanted to know if there was a newer webrev for me to review based
> on those other minor fixes?
>
Sorry, I did not make a new webrev: I took time to review again the array
cache / reset state code and tried some simplifications (postponed to #3).
On 6/12/15 5:45 AM, Laurent Bourgès wrote:
>
>> A general thought - should we have foo_initial arrays if we are
>> getting everything from a cache?
>>
>> I adopted this approach to maximize performance for the 99% cases: I
>> sized properly the initial arrays and they takes ~ 512Kb per
>> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
>> bit slower). In march 2013, I started by using the cache for all array
>> accesses and it was 30% slower !
>>
>
> Was the 30% back when we were allocating or clearing all of the arrays?
> Now that we only clear what we use, the initial sizes of the arrays
> shouldn't matter so much. It would be worth a little extra size overhead
> to not have so many special cases and special handling for the "_initial"
> arrays if the performance is now essentially identical.
>
I agree that the lastest webrev added drity caches for float / int so less
arrays must be zero-filled before putting them back in the cache.
It improved the gap ~ 3% in average:
==> marlin_REF.log <==
Threads 4 1 2 4
Pct95 237.848 233.887 238.430 241.226
==> marlin_NO_INITIALS.log <==
Threads 4 1 2 4
Pct95 244.261 241.116 244.028 247.639
However, the performance gap is more important for maps with many shapes: ~
6% (4T)
REF:
Test Threads Ops Med
Pct95 Avg StdDev Min Max TotalOps [ms/op]
dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 762.882
764.781 763.136 0.715 762.219 765.110 25
dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 768.975
774.134 769.167 4.105 764.517 774.750 50
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 770.511
775.448 770.448 4.668 765.125 787.473 100
NO_INITIALS:
Test Threads Ops Med
Pct95 Avg StdDev Min Max TotalOps [ms/op]
dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 808.793
810.227 808.959 0.752 807.831 810.871 25
dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 813.435
815.646 813.357 1.361 811.076 817.711 50
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 815.775
823.593 817.352 6.752 813.031 872.658 100
A notable difference between initial_arrays and cache consists in GC:
initial_arrays are well_sized and stored in the RendererContext directly
(soft reference in my benchmarks) but the cache is stored using a weak
reference => more GC overhead.
I think it possible to encapsulate the initial array into a CacheReference
that will handle both cache (appropriate Dirty / Clean ArrayCache) & the
initial array (tuned size) to simplify the renderer's code.
>
> I will look again at these cases: I agree "leaking refs" should be set
>> to null to be sure. However, Marlin code is always executing the pattern
>> init() / dispose() so leaking pointers are not critical but OK let's set
>> all of them to null.
>>
>> For initial arrays, it seems redundant to set to null in dispose() and
>> set to initial in init() = useless and counter-productive !
>>
>
> It gets easier to review and maintain code if there is a clear reset to
> some initial conditions is all. I would like to try to keep the code as
> simple and straightforward as possible while evaluating algorithmic
> improvements without introducing potential bugs with a lot of special case
> code and then we can worry about the 1%'s later.
>
Agreed: I will make efforts in #3 step: improve / simplify the init /
dispose sequence = reset state in an uniform manner.
Notably: the Stroker / Dasher dispose() must be called as early as possible
from Renderer directly (endRendering and / or dispose) in case of exception.
> Also, we may find moving the code to native to be more of a performance
> improvement than trying to fight with Hotspot's load/store policies and so
> any code that tries to make "slightly more optimal byte codes" would end up
> not being needed and would add some noise to any such native conversion.
>
Up to now I prefer coding java as hotspot does all that optimization for
free and it avoids me coding in C++ : error prone and a lot more complex to
make it clear.
Moreover, I installed and tried JITWatch + hotspot's assembler plugin to
let me look at the asm code (+ hints): if I have some free time, I could
have a look and see ...
> But, I do agree that technically you are correct about some of these
> suggestions leading to some provably redundant code. The issue is more
> that constantly having to reprove the redundancies gets in the way of some
> other progress we could be making.
>
Ok, I plan to rewrite the dispose code in #3 step.
And, on the other hand, right now you are the engineer driving this from
> the workload point of view (and I apologize if I am inadvertently in a role
> that seems to be described as "the engineer getting in the way"), so I'm
> willing to rely on your commitment to be diligent about these
> optimizations. I'm just stating a case for whether they should be the
> focus yet.
>
Thanks; if you need, I can help you running MapBench on your side.
For me, Marlin is fast enough now (like ductus) to focus on quality: I
mentioned proper gamma correction (but text rendering issues) & resampling
filters (moire...) or adaptive super sampling.
> Sorry it is not a flaw but an advantage: the initial arrays that can
>> grow in widenXXXArray() have an odd size as indicated by the comment:
>> // +1 to avoid recycling in Helpers.widenArray()
>> private final int[] edgePtrs_initial = new int[INITIAL_SMALL_ARRAY
>> + 1]; // 4K
>>
>
> I saw that after the fact, but my main point remains that this is a
> fragile inter-module optimization. I was unaware that it could save us 30%
> to have special-cased "initial" arrays, and that fact makes this
> relationship that much more important but I wonder if that is still true.
You're right: now it is ~ 5%. However, it also generates more garbage (weak
reference) and maybe a bigger memory footprint: so for web servers, it may
be a lot worse !
>
> If we ditch the initial arrays because the new "dirty array" management
> removes that performance penalty then this all goes away. And, even if we
> keep the initial array concept it would be nice to find a more reliable way
> to manage that than a side effect of its length.
>
I propose to keep it but use a better Object oriented design =
CacheReference (Cache & initial array references) very close to what you
proposed so it will handle consistently all the initial array & caching
complexity per array type.
> I wanted to have a simple but robust API that can be easily used in
>> Marlin.
>>
>
> I'd like to think that I'm working/promoting towards simple and robust
> APIs as well.
>
Of course, you are.
>
> I agree the documentation can be improved a lot and I could introduce a
>> new helper method in ArrayCache to create the initial arrays in a safer
>> and explicit way :
>>
>> int[] createNonCachedIntArray(int length) {
>> // return only odd length to detect wrong cache usage
>> if (length & 1 == 0) {
>> length +=1;
>> }
>> return new int[length];
>> }
>>
>
> In a future effort, if we find that we must keep some small initial arrays
> for performance, we could make this a more explicit and tunable parameter
> on the caches. Let's table that until we consider if they buy us any
> performance after these dirty array changes.
>
Ok, let's see...
> Anyway I agree the cache handling can be improved a lot: try using
>> generics for array types with a lambda factory to allocate them.
>>
>
> I just wish that generics could handle primitive arrays, but they can't.
> I did a similar generification thing for the raster format conversion code
> in FX, but I had to play games with the generic types being ByteBuffer and
> IntBuffer and having the variants of those accept the proper type of
> primitive arrays. It wasn't pretty, but it did work out in the end.
>
I know generics do not work but it seems possible to use abstract methods:
ArrayCache<K>
- <K> create (length)
- copy (<K> array, ...)
- fill(<K> array) ...
implemented by
IntArrayCache extends ArrayCache<int[]>
- int[] create (length) { return new int[length]; }
- copy (int[] array) { System.arraycopy(array ... }
However, it may give ugly code ... wrapping all array utility methods by
abstract methods (length, fill, copy, ...)
If I do not work or is too slow, I could use a code generator to generate
all needed variants from a template. Do you have such code generator in
openjdk ?
>
> The caching code is an example of "code with no obvious flaws
>> because it is hard to figure out if it has flaws". We should strive
>> for "code with obviously no flaws because it is easy to verify that
>> it is doing everything correctly".
>>
>>
>> I disagree the term "flaw" but I agree it is hard to figure out what is
>> done.
>>
>
> I took that from a talk on producing more secure code. I think it
> originally came from an overall system design that could easily be proven
> secure. The first release of Java required all sensitive Java code to be
> constantly asking "am I in the context that this operation is allowed?" and
> the detection was fragile. It was replaced by code where all sensitive
> operations were rejected unless the surrounding code proved its protected
> context which was much more secure. "Disallowed unless requested" was
> harder to make a mistake than "allowed unless it looks questionable".
>
> But, that same philosophy serves a lot of code well. If you start having
> a bunch of "agreements" between modules then they can quickly get out of
> synch through regular maintenance and sometimes spiral out of control and
> you spend more time verifying that you're still satisfying all of these
> conventions rather than letting the code make sure things are kosher
> because only valid operations are expressable.
>
Thanks for the explanations, I totally agree and sometimes I write too
complicated code (or fancy).
>
> OK, I can work later on a new framework:
>> - ArrayCache<K> abstract class where <K> is byte[], int[], float[]
>> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays
>> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays
>>
>
> I think I can trust the existing architecture for at least a putback of
> the last webrev that you sent out while we braingstorm a better caching
> mechanism - let me make one last pass through it and I'll send an OK later
> this afternoon.
>
Perfect.
>
> What do you mean by "meking further real improvements in the algorithms" ?
>>
>
> Things like investigating DDA in the inner loops, alternate ways to manage
> crossings that require less sorting, enhancing the stroker to do
> non-uniform scaling (which would eliminate the double transform in some
> cases), potentially passing the crossings to the output loop rather than
> having to convert them into an alpha array, etc. A lot of that is easier
> to review if there aren't a lot of extra "this line added 1% performance"
> code adding visual noise.
>
> Who could help ? I am the only one working concretely on this code, help
>> wanted.
>>
>
> I'd love to get more people involved, but history tells us that this is a
> fairly lonely field. :(
>
Too bad: it costed me a lot of evenings in past months so I wonder how long
I can afford such intense effort (family, holidays) !
I may have some time now that FX is in tightened bug-fix mode to do some
> tinkering, but mainly I will probably be much quicker on getting reviews
> done which may help as much as having another pair of hands in practice...
>
Great news: you will definitely help me a lot by making reviews, giving
advices and maybe some code.
Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150618/18be0063/attachment-0001.html>
More information about the graphics-rasterizer-dev
mailing list