[OpenJDK Rasterizer] RFR: Marlin renderer #2

Laurent Bourgès bourges.laurent at gmail.com
Thu Jun 11 10:27:45 UTC 2015


The reason I ask is that it would be against all odds that you'd be able to
> detect loading those 2 variables into temp local variables given how
> complex a given rendering operation is.

I agree that I also doubt to be able to detect such very subtle code change
in my benchmarks.

However, using a local variable may help the hotspot compiler to produce
better optimized code (assembler).

> I could imagine that loading a field into a local variable in the inner
> loop of the crossings loops might be noticeable, but not for the case of 2
> calculations in a once per op setup/teardown method.

I agree but this method is an "hotspot": 1 call per shape so 135 000 per
map ...

> I'm not saying that we should never do shadow fields in local variables,
> just that we should only do that for cases that affect performance as it
> can make the code less readable if it's not going to make a difference and
> also you run the slight risk in some cases of updating the local and not
> saving it back to the field (usually it's done right when such a shadow
> local is introduced, but in some later bug fix some other engineer decides
> that they can add a shortcut through the code without realizing they are
> short-cutting past a "shadow to field" sync point and it introduces a bug -
> thus I reserve a technique like that only for important cases - in
> particular loops executed a number of times...)

I agree your approach but there is no risk as the boundsMaxY is constant :

Variant: LOCAL VAR:
        final int _boundsMinY = boundsMinY;
        final int _boundsMaxY = boundsMaxY;

        final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
        final int spmaxY;
        int maxY = FloatMath.ceil(edgeMaxY);
        if (maxY <= _boundsMaxY - 1) {
            spmaxY = maxY;
        } else {
            spmaxY = _boundsMaxY - 1;
            maxY   = _boundsMaxY;
        buckets_minY = spminY - _boundsMinY;
        buckets_maxY = maxY   - _boundsMinY;

Variant: NO LOCAL VAR for boundsMaxY only:
        final int _boundsMinY = boundsMinY;

        final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
        final int spmaxY;
        int maxY = FloatMath.ceil(edgeMaxY);
        if (maxY <= boundsMaxY - 1) {
            spmaxY = maxY;
        } else {
            spmaxY = boundsMaxY - 1;
            maxY   = boundsMaxY;
        buckets_minY = spminY - _boundsMinY;
        buckets_maxY = maxY   - _boundsMinY;

However, I run quickly 2 runs  against the 2 variants:

Variant: LOCAL VAR
==> marlin_ojdk_1.log <==
Tests    27    9    9    9
Threads    4    1    2    4
Pct95    132.139    130.701    132.350    133.365

==> marlin_ojdk_2.log <==
Tests    27    9    9    9
Threads    4    1    2    4
Pct95    133.061    131.925    132.816    134.442

Test                                            Threads Ops     Med
Pct95   Avg     StdDev  Min     Max     TotalOps        [ms/op]
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         1
25    765.215    766.428    765.174    0.740    763.819    766.604    25
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         2
50    772.095    777.983    772.704    3.695    768.296    779.644    50
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         4
100    773.461    782.407    773.566    4.857    765.198    790.660    100
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         1
25    774.916    777.312    775.003    1.268    773.072    777.596    25
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         2
50    780.500    782.350    780.278    1.506    776.916    783.797    50
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         4
100    780.476    792.769    781.983    4.912    774.828    794.644    100


==> marlin_ojdk_1.log <==
Tests    27    9    9    9
Threads    4    1    2    4
Pct95    132.698    131.973    132.622    133.499

==> marlin_ojdk_2.log <==
Tests    27    9    9    9
Threads    4    1    2    4
Pct95    134.292    134.741    133.668    134.468

Test                                            Threads Ops     Med
Pct95   Avg     StdDev  Min     Max     TotalOps        [ms/op]
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         1
25    773.685    775.946    773.759    1.546    770.888    777.375    25
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         2
50    776.336    778.360    776.388    1.246    773.655    778.625    50
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         4
100    776.398    781.926    776.843    2.525    772.224    786.830    100
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         1
25    793.264    800.969    793.383    5.388    783.434    801.322    25
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         2
50    780.957    789.660    781.136    4.984    773.533    793.514    50
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser         4
100    780.952    792.372    781.176    5.655    767.275    796.447    100

As you can see, there is some variability between tests & runs: synthetic
results are only 1 pt better.

I you look at the Pct95 (95th percentile), it seems the LOCAL VAR is
slightly faster: ~ 10ms / 780 = 1%

But it is maybe just the scatter ie not representative at all !

So it is hard to conclude with only 2 runs: I could maybe make more runs or
longer runs (better statistics).

I looked at javap (byte codes) but maybe JITWatch could be helpful to see
the concrete assembler code:

       45: getfield      #34                 // Field boundsMinY:I
       48: istore_3
       49: aload_0

*-      50: getfield      #37                 // Field boundsMaxY:I-
53: istore        4*-      55: aload_0
-      56: getfield      #40                 // Field edgeMinY:F
-      59: invokestatic  #35                 // Method
-      62: iload_3
-      63: invokestatic  #36                 // Method
-      66: istore        5
-      68: aload_0
-      69: getfield      #41                 // Field edgeMaxY:F
-      72: invokestatic  #35                 // Method
-      75: istore        7
-      77: iload         7

*-      79: iload         4-      81: iconst_1-      82: isub*-      83:
if_icmpgt     93
-      86: iload         7
-      88: istore        6
-      90: goto          103

*-      93: iload         4-      95: iconst_1-      96: isub*-      97:
istore        6
-      99: iload         4
-     101: istore        7

+      50: getfield      #40                 // Field edgeMinY:F
+      53: invokestatic  #35                 // Method
+      56: iload_3
+      57: invokestatic  #36                 // Method
+      60: istore        4
+      62: aload_0
+      63: getfield      #41                 // Field edgeMaxY:F
+      66: invokestatic  #35                 // Method
+      69: istore        6
+      71: iload         6
+      73: aload_0

*+      74: getfield      #37                 // Field boundsMaxY:I+
77: iconst_1+      78: isub*+      79: if_icmpgt     89
+      82: iload         6
+      84: istore        5
+      86: goto          103
+      89: aload_0

*+      90: getfield      #37                 // Field boundsMaxY:I+
93: iconst_1+      94: isub*+      95: istore        5
+      97: aload_0
*+      98: getfield      #37                 // Field boundsMaxY:I*
+     101: istore        6

The difference is:
[3 getfield #37] vs [1 getfield #37 + 1 istore 4 + 2 iload 4]

Personally I prefer the LOCAL VAR variant that do not introduce any risk in
this case (read-only var).

2015-06-11 11:19 GMT+02:00 Jim Graham <james.graham at oracle.com>:
Also, forgive me if I'm rehashing old info (my memory for details can get a
little foggy at times), but what is the status of trying to get your
benchmark in house for local testing and verification?  Have you already
shared it with us and I missed it?  Or does it contain proprietary data?

We never really discussed this point: MapBench is already an open source
project (including serialized map datasets) available on my github:

It provides both testing (MapDisplay) & benchmarking (MapBench) with many
different profiles (long run, scale tests, complex transform ...) that I
use to perform both regression and performance tests.

If you are interested, I should make a new MapBench release (v0.4 not yet
released, v0.3 released in march 2014), improve the wiki pages and help you
trying it...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150611/14c3ea55/attachment-0001.html>

More information about the graphics-rasterizer-dev mailing list