[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Tue Aug 25 20:39:49 UTC 2015

If you'll give me a few minutes, I'll synch the repo before this goes in...

			...jim

On 8/24/15 3:08 PM, Jim Graham wrote:
> Hi Laurent,
>
> I'm feeling much better this week (and can even start typing with my bad
> hand in short spurts now), so I'll hopefully be more on the ball now.
>
> Short version - looks good to go, consider the following input at your
> own discretion and go ahead and push.
>
> Reviewing s3.4:
>
> On 8/17/15 2:34 PM, Laurent Bourgès wrote:
>> Changes:
>> - Renderer: remove unused D_ERR_STEP_MAX and 0x1 replaced by 1 for
>> readability
>
> Renderer, line 319 - it makes sense to me that the scale changes would
> be based off of the same metric.  Basically the constraint is keeping
> the speed (dxy) between INC and DEC bounds.  I'll admit, though, that I
> haven't been through the actual derivations of the math here, but I
> thought I'd give my 2 cents on your comment there.
>
> Renderer, line 479 - I often put extra parens in for cases like this
> because it isn't common knowledge whether cast or add is higher in
> precedence.  I generally assume no better knowledge than the basic
> PEMDAS rules that we learn in grade school. But that's just my style (I
> also don't have the OOO table memorized, but maybe that's just my
> senility ;).
>
> 0x1 vs 1 - I usually use 0x1 when I'm using numbers for their bit
> patterns, so | and & typically use 0x constants in my style philosophy.
>   In the case of "1" it's a degenerate case that doesn't seem to matter,
> but the added 0x tells me to think in terms of bits, not cardinal values.
>
> Renderer, note for future investigation - we store the direction flag in
> Y_MAX, but then we need to do manipulation to process Y_MAX and we also
> need more manipulation to transfer that bit to crossing values.  What if
> instead we stored it in the lowest bit of curx?  We'd then have 31.31
> fixed point, we'd have to process slope so that its whole part was
> doubled (to skip affecting the LSbit of curx) and we'd have to do the
> carry using "((error >> 31) << 1)", but we'd get the transfer of the
> direction bit to the crossing for free and all in all, the total
> operations may be reduced (at the cost of 1 bit of X coordinate range).
>
>> - MarlinRenderingEngine: optimized loop in pathTo()
>
> Looks good
>
>> - Dasher: TODO remains related to dash length issue
>
> OK.
>
>> Hope it is good now, I plan to work soon on the Array Cache & dispose
>> code.
>
> Looks great!  Consider the style issues I mentioned above if you wish
> and otherwise it is good to go.  I don't need to see any further webrevs
> on those issues.
>
> One more suggestion for optimization below can be considered in
> subsequent work:
>
>>     For CHECK_NAN, how fast is Math.isNaN?  Is it worth first comparing
>>     the intpart to the value that NaN converts to before calling isNaN?
>>     The majority case, by a wide margin, is that the value is not NaN.
>>
>>
>> That's the aim: I test first the most probable case (a <= intpart) for
>> ceil() and then check possible overflow and finally check for NaN values
>> to avoid adding / substracting 1 !
>> I made benchmarks and this implementation optimized for the integer
>> domain works very well: 2x faster.
>
> I guess I wasn't clear.  I understand why the 3 parts of the test are in
> the order that they are in, but I was referring only to the "CHECK_NAN
> && Float.isNaN(a)" clause.  Is that worth adding one more test for
> "CHECK_NAN && intpart == NAN_TO_INT_VAL && Float.isNaN(a)" in that one
> clause?
>
> For the ceil_int case, a very common case is that a is non-negative and
> non-integer.  In that case, "a <= intpart" will fail and you will do the
> CHECK_NAN test before you can return "intpart+1".
>
> For the floor_int case, you would only get to the CHECK_NAN case on
> negative-non-integers which we don't expect a lot of, but in general,
> the added test might make it slightly faster.
>
>>     I was under the understanding that if a constant is declared final
>>     and static, then it is compiled directly into the code and there is
>>     no constant lookup.  Check the bytecodes and you should see a
>>     hard-coded literal 0x7fffffff everywhere it is used...
>>
>>
>> I agree but it seems slightly faster: probably loading the literal is a
>> bit slower than loading from the stack ?
>> Maybe some hotspot experts could give their opinion.
>> As this is the renderer hot loop, I prefer making it as fast as possible.
>
> In the case of 0x7fffffff it may be because of the size of the constant.
>   A smaller literal like "1" or "2" might be faster as an in-lined
> source literal.
>
>>     I think that at least for now they should either be
>>     sun.java2d.renderer.marlin.* or
>>     just sun.java2d.marlin.*
>>
>>
>> Ok, I can rename them in the next webrev as it requires me to upgrade
>> all my testing & benchmarking scripts...
>
> Sounds fine to me.  This is more "something we need to finalize before
> integrating into the master workspace" than an issue for any given webrev.
>
>>     But the more important question may be around how much benefit these
>>     bring compared to the additional testing overhead of so many
>>     combinations ?
>>
>>
>> I advocate it is not easy to test all combinations but it seems not a
>> reasonable testing objective.
>>
>> As I said, it is mainly tuning or quality settings but only few are
>> important (subpixels X / Y, tile size, initial pixel size).
>
> Another thing to consider is that customers would have to do lots of
> testing of performance of tweaking these values and so they would
> discover the problems and report them to us before they become reliant
> on them.  However, down the road they may accept version+1 of Marlin
> into their existing environment without fully testing and we may
> introduce a bug that might take them by surprise.
>
> (Again, something to discuss and decide in the long term - not for this
> particular patch)...
>
>              ...jim