[OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3
Jim Graham
james.graham at oracle.com
Mon Aug 24 22:08:04 UTC 2015
Hi Laurent,
I'm feeling much better this week (and can even start typing with my bad
hand in short spurts now), so I'll hopefully be more on the ball now.
Short version - looks good to go, consider the following input at your
own discretion and go ahead and push.
Reviewing s3.4:
On 8/17/15 2:34 PM, Laurent Bourgès wrote:
> Changes:
> - Renderer: remove unused D_ERR_STEP_MAX and 0x1 replaced by 1 for
> readability
Renderer, line 319 - it makes sense to me that the scale changes would
be based off of the same metric. Basically the constraint is keeping
the speed (dxy) between INC and DEC bounds. I'll admit, though, that I
haven't been through the actual derivations of the math here, but I
thought I'd give my 2 cents on your comment there.
Renderer, line 479 - I often put extra parens in for cases like this
because it isn't common knowledge whether cast or add is higher in
precedence. I generally assume no better knowledge than the basic
PEMDAS rules that we learn in grade school. But that's just my style (I
also don't have the OOO table memorized, but maybe that's just my
senility ;).
0x1 vs 1 - I usually use 0x1 when I'm using numbers for their bit
patterns, so | and & typically use 0x constants in my style philosophy.
In the case of "1" it's a degenerate case that doesn't seem to matter,
but the added 0x tells me to think in terms of bits, not cardinal values.
Renderer, note for future investigation - we store the direction flag in
Y_MAX, but then we need to do manipulation to process Y_MAX and we also
need more manipulation to transfer that bit to crossing values. What if
instead we stored it in the lowest bit of curx? We'd then have 31.31
fixed point, we'd have to process slope so that its whole part was
doubled (to skip affecting the LSbit of curx) and we'd have to do the
carry using "((error >> 31) << 1)", but we'd get the transfer of the
direction bit to the crossing for free and all in all, the total
operations may be reduced (at the cost of 1 bit of X coordinate range).
> - MarlinRenderingEngine: optimized loop in pathTo()
Looks good
> - Dasher: TODO remains related to dash length issue
OK.
> Hope it is good now, I plan to work soon on the Array Cache & dispose code.
Looks great! Consider the style issues I mentioned above if you wish
and otherwise it is good to go. I don't need to see any further webrevs
on those issues.
One more suggestion for optimization below can be considered in
subsequent work:
> For CHECK_NAN, how fast is Math.isNaN? Is it worth first comparing
> the intpart to the value that NaN converts to before calling isNaN?
> The majority case, by a wide margin, is that the value is not NaN.
>
>
> That's the aim: I test first the most probable case (a <= intpart) for
> ceil() and then check possible overflow and finally check for NaN values
> to avoid adding / substracting 1 !
> I made benchmarks and this implementation optimized for the integer
> domain works very well: 2x faster.
I guess I wasn't clear. I understand why the 3 parts of the test are in
the order that they are in, but I was referring only to the "CHECK_NAN
&& Float.isNaN(a)" clause. Is that worth adding one more test for
"CHECK_NAN && intpart == NAN_TO_INT_VAL && Float.isNaN(a)" in that one
clause?
For the ceil_int case, a very common case is that a is non-negative and
non-integer. In that case, "a <= intpart" will fail and you will do the
CHECK_NAN test before you can return "intpart+1".
For the floor_int case, you would only get to the CHECK_NAN case on
negative-non-integers which we don't expect a lot of, but in general,
the added test might make it slightly faster.
> I was under the understanding that if a constant is declared final
> and static, then it is compiled directly into the code and there is
> no constant lookup. Check the bytecodes and you should see a
> hard-coded literal 0x7fffffff everywhere it is used...
>
>
> I agree but it seems slightly faster: probably loading the literal is a
> bit slower than loading from the stack ?
> Maybe some hotspot experts could give their opinion.
> As this is the renderer hot loop, I prefer making it as fast as possible.
In the case of 0x7fffffff it may be because of the size of the constant.
A smaller literal like "1" or "2" might be faster as an in-lined
source literal.
> I think that at least for now they should either be
> sun.java2d.renderer.marlin.* or
> just sun.java2d.marlin.*
>
>
> Ok, I can rename them in the next webrev as it requires me to upgrade
> all my testing & benchmarking scripts...
Sounds fine to me. This is more "something we need to finalize before
integrating into the master workspace" than an issue for any given webrev.
> But the more important question may be around how much benefit these
> bring compared to the additional testing overhead of so many
> combinations ?
>
>
> I advocate it is not easy to test all combinations but it seems not a
> reasonable testing objective.
>
> As I said, it is mainly tuning or quality settings but only few are
> important (subpixels X / Y, tile size, initial pixel size).
Another thing to consider is that customers would have to do lots of
testing of performance of tweaking these values and so they would
discover the problems and report them to us before they become reliant
on them. However, down the road they may accept version+1 of Marlin
into their existing environment without fully testing and we may
introduce a bug that might take them by surprise.
(Again, something to discuss and decide in the long term - not for this
particular patch)...
...jim
More information about the graphics-rasterizer-dev
mailing list