[OpenJDK Rasterizer] RFR: Marlin renderer #2
Laurent Bourgès
bourges.laurent at gmail.com
Thu Jun 11 10:27:45 UTC 2015
Jim,
The reason I ask is that it would be against all odds that you'd be able to
> detect loading those 2 variables into temp local variables given how
> complex a given rendering operation is.
>
I agree that I also doubt to be able to detect such very subtle code change
in my benchmarks.
However, using a local variable may help the hotspot compiler to produce
better optimized code (assembler).
> I could imagine that loading a field into a local variable in the inner
> loop of the crossings loops might be noticeable, but not for the case of 2
> calculations in a once per op setup/teardown method.
>
I agree but this method is an "hotspot": 1 call per shape so 135 000 per
map ...
> I'm not saying that we should never do shadow fields in local variables,
> just that we should only do that for cases that affect performance as it
> can make the code less readable if it's not going to make a difference and
> also you run the slight risk in some cases of updating the local and not
> saving it back to the field (usually it's done right when such a shadow
> local is introduced, but in some later bug fix some other engineer decides
> that they can add a shortcut through the code without realizing they are
> short-cutting past a "shadow to field" sync point and it introduces a bug -
> thus I reserve a technique like that only for important cases - in
> particular loops executed a number of times...)
>
I agree your approach but there is no risk as the boundsMaxY is constant :
Variant: LOCAL VAR:
final int _boundsMinY = boundsMinY;
final int _boundsMaxY = boundsMaxY;
final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
final int spmaxY;
int maxY = FloatMath.ceil(edgeMaxY);
if (maxY <= _boundsMaxY - 1) {
spmaxY = maxY;
} else {
spmaxY = _boundsMaxY - 1;
maxY = _boundsMaxY;
}
buckets_minY = spminY - _boundsMinY;
buckets_maxY = maxY - _boundsMinY;
Variant: NO LOCAL VAR for boundsMaxY only:
final int _boundsMinY = boundsMinY;
final int spminY = Math.max(FloatMath.ceil(edgeMinY), _boundsMinY);
final int spmaxY;
int maxY = FloatMath.ceil(edgeMaxY);
if (maxY <= boundsMaxY - 1) {
spmaxY = maxY;
} else {
spmaxY = boundsMaxY - 1;
maxY = boundsMaxY;
}
buckets_minY = spminY - _boundsMinY;
buckets_maxY = maxY - _boundsMinY;
However, I run quickly 2 runs against the 2 variants:
Variant: LOCAL VAR
==> marlin_ojdk_1.log <==
Tests 27 9 9 9
Threads 4 1 2 4
Pct95 132.139 130.701 132.350 133.365
==> marlin_ojdk_2.log <==
Tests 27 9 9 9
Threads 4 1 2 4
Pct95 133.061 131.925 132.816 134.442
Test Threads Ops Med
Pct95 Avg StdDev Min Max TotalOps [ms/op]
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 1
25 765.215 766.428 765.174 0.740 763.819 766.604 25
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 2
50 772.095 777.983 772.704 3.695 768.296 779.644 50
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 4
100 773.461 782.407 773.566 4.857 765.198 790.660 100
...
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 1
25 774.916 777.312 775.003 1.268 773.072 777.596 25
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 2
50 780.500 782.350 780.278 1.506 776.916 783.797 50
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 4
100 780.476 792.769 781.983 4.912 774.828 794.644 100
Variant: NO LOCAL VAR
==> marlin_ojdk_1.log <==
Tests 27 9 9 9
Threads 4 1 2 4
Pct95 132.698 131.973 132.622 133.499
==> marlin_ojdk_2.log <==
Tests 27 9 9 9
Threads 4 1 2 4
Pct95 134.292 134.741 133.668 134.468
Test Threads Ops Med
Pct95 Avg StdDev Min Max TotalOps [ms/op]
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 1
25 773.685 775.946 773.759 1.546 770.888 777.375 25
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 2
50 776.336 778.360 776.388 1.246 773.655 778.625 50
marlin_ojdk_1.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 4
100 776.398 781.926 776.843 2.525 772.224 786.830 100
...
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 1
25 793.264 800.969 793.383 5.388 783.434 801.322 25
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 2
50 780.957 789.660 781.136 4.984 773.533 793.514 50
marlin_ojdk_2.log:dc_shp_alllayers_2013-00-30-07-00-47.ser 4
100 780.952 792.372 781.176 5.655 767.275 796.447 100
As you can see, there is some variability between tests & runs: synthetic
results are only 1 pt better.
I you look at the Pct95 (95th percentile), it seems the LOCAL VAR is
slightly faster: ~ 10ms / 780 = 1%
But it is maybe just the scatter ie not representative at all !
So it is hard to conclude with only 2 runs: I could maybe make more runs or
longer runs (better statistics).
I looked at javap (byte codes) but maybe JITWatch could be helpful to see
the concrete assembler code:
45: getfield #34 // Field boundsMinY:I
48: istore_3
49: aload_0
*- 50: getfield #37 // Field boundsMaxY:I-
53: istore 4*- 55: aload_0
- 56: getfield #40 // Field edgeMinY:F
- 59: invokestatic #35 // Method
sun/java2d/marlin/FloatMath.ceil:(F)I
- 62: iload_3
- 63: invokestatic #36 // Method
java/lang/Math.max:(II)I
- 66: istore 5
- 68: aload_0
- 69: getfield #41 // Field edgeMaxY:F
- 72: invokestatic #35 // Method
sun/java2d/marlin/FloatMath.ceil:(F)I
- 75: istore 7
- 77: iload 7
*- 79: iload 4- 81: iconst_1- 82: isub*- 83:
if_icmpgt 93
- 86: iload 7
- 88: istore 6
- 90: goto 103
*- 93: iload 4- 95: iconst_1- 96: isub*- 97:
istore 6
- 99: iload 4
- 101: istore 7
+ 50: getfield #40 // Field edgeMinY:F
+ 53: invokestatic #35 // Method
sun/java2d/marlin/FloatMath.ceil:(F)I
+ 56: iload_3
+ 57: invokestatic #36 // Method
java/lang/Math.max:(II)I
+ 60: istore 4
+ 62: aload_0
+ 63: getfield #41 // Field edgeMaxY:F
+ 66: invokestatic #35 // Method
sun/java2d/marlin/FloatMath.ceil:(F)I
+ 69: istore 6
+ 71: iload 6
+ 73: aload_0
*+ 74: getfield #37 // Field boundsMaxY:I+
77: iconst_1+ 78: isub*+ 79: if_icmpgt 89
+ 82: iload 6
+ 84: istore 5
+ 86: goto 103
+ 89: aload_0
*+ 90: getfield #37 // Field boundsMaxY:I+
93: iconst_1+ 94: isub*+ 95: istore 5
+ 97: aload_0
*+ 98: getfield #37 // Field boundsMaxY:I*
+ 101: istore 6
The difference is:
[3 getfield #37] vs [1 getfield #37 + 1 istore 4 + 2 iload 4]
Personally I prefer the LOCAL VAR variant that do not introduce any risk in
this case (read-only var).
2015-06-11 11:19 GMT+02:00 Jim Graham <james.graham at oracle.com>:
Also, forgive me if I'm rehashing old info (my memory for details can get a
little foggy at times), but what is the status of trying to get your
benchmark in house for local testing and verification? Have you already
shared it with us and I missed it? Or does it contain proprietary data?
We never really discussed this point: MapBench is already an open source
project (including serialized map datasets) available on my github:
https://github.com/bourgesl/mapbench/
It provides both testing (MapDisplay) & benchmarking (MapBench) with many
different profiles (long run, scale tests, complex transform ...) that I
use to perform both regression and performance tests.
If you are interested, I should make a new MapBench release (v0.4 not yet
released, v0.3 released in march 2014), improve the wiki pages and help you
trying it...
Cheers,
Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/graphics-rasterizer-dev/attachments/20150611/14c3ea55/attachment-0001.html>
More information about the graphics-rasterizer-dev
mailing list