[OpenJDK Rasterizer] RFR: Marlin renderer #2

Fri Jun 19 22:34:34 UTC 2015

Hi Laurent,

I still have no idea what you mean when you say "arrays = [0]".

Does that mean "new foo[0]"?  Or "new foo[getBucketSize of bucket #0]"?

The latter is what I was envisioning using...

			...jim

On 6/19/15 5:08 AM, Laurent Bourgès wrote:
> Jim,
>
> here are the benchmark results:
> - REF: Marlin reference = initial capacity tuned for arrays and
> OffHeapEdgeArray
> - NO_INITIAL: initial arrays = [0]
> - NO_INITIALS_OFFHEAP_16: initial arrays = [0] and OffHeapEdgeArray(16)
>
> I pushed all details (stats & benchmarks):
> http://cr.openjdk.java.net/~lbourges/marlin/bench_initial_arrays/
>
>
> 1/ Benchmark results:
>
> The OffHeapEdgeArray size is more critical: 5% slower than previous test
> (initial arrays = [0])
>
> *Renderer* 	*Test count* 	30 	10 	10 	10
>
> 	*Threads* 	*4* 	*1* 	*2* 	*4*
> *REF* 	*Pct95* 	237.848 	233.887 	238.43 	241.226
> *NO_INITIALS* 	*Pct95* 	244.261 	241.116 	244.028 	247.639
> *NO_INITIALS
> OFF_HEAP_16* 	*Pct95* 	257.091 	253.211 	256.13 	261.93
>
>
> For the complex map, it is more pronounced: ~20% slower than the
> reference test:
>
> *REF:*
> dc_shp_alllayers_2013-00-30-07-00-47.ser 	4 	100 	770.511 	775.448
> 770.448 	4.668 	765.125 	787.473 	100 	
> 	100.00%
>
> *NO_INITIALS_OFF_HEAP_16:*
> dc_shp_alllayers_2013-00-30-07-00-47.ser 	4 	100 	902.238 	934.679
> 910.759 	14.478 	898.332 	956.92 	100 	
> 	120.53%
>
> **
> *NO_INITIALS:*
> dc_shp_alllayers_2013-00-30-07-00-47.ser 	4 	100 	815.775 	823.593
> 817.352 	6.752 	813.031 	872.658 	100 	
> 	106.21%
>
>
>
> 2/ Statistics: cache accesses (and array sizes per bucket) are very huge.
>
> For example:
> - stats_NO_INITIALS.log:
> Loading DrawingCommands: ../maps/dc_shp_alllayers_2013-00-30-07-00-47.ser
> Loaded DrawingCommands: DrawingCommands{width=1400, height=800,
> commands=*135213*}
> ...
> INFO: ArrayCache: int resize: 0 - dirty int resize: 140612 - dirty float
> resize: 104025 - dirty byte resize: 103966 - oversize: 0
> ...
> INFO: Array caches for thread: ctx1
> INFO: IntArrayCache[4096]: get: 281224 created: 2 - returned: 281224 ::
> cache size: 2
> INFO: Dirty Array caches for thread: ctx1
> INFO: IntArrayCache[4096]: get: 562448 created: 4 - returned: 562448 ::
> cache size: 4
> INFO: FloatArrayCache[4096]: get: 104025 created: 2 - returned: 104025
> :: cache size: 2
> INFO: ByteArrayCache[65536]: get: 103966 created: 1 - returned: 103966
> :: cache size: 1
>
> - stats_NO_INITIALS_OFFHEAP_16.log:
> INFO: renderer.edges.resize[*483598*] sum: 86874016 avg: 179.64 [32 | 4096]
>
> The OffHeapEdgeArray is resized a lot for this map: 4096 is the good
> capacity for this test case.
>
> Several test cases need a lot more memory: 32K, 64K or 128K.
> *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[15915] sum:
> 16182208 avg: 1016.789 [32 | 131072]*
> *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[7807] sum:
> 6053440 avg: 775.386 [32 | 65536]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[2231] sum:
> 4420224 avg: 1981.274 [32 | 131072]*
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[483598] sum:
> 86874016 avg: 179.64 [32 | 4096]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[4696] sum:
> 1284224 avg: 273.471 [32 | 8192]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[1655] sum:
> 520224 avg: 314.334 [32 | 8192]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[794] sum:
> 1068960 avg: 1346.297 [32 | 16384]
> *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[852] sum:
> 938048 avg: 1100.995 [32 | 32768]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[22] sum:
> 134217696 avg: 6100804.363 [32 | 67108864]
> stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[62062] sum:
> 9914976 avg: 159.759 [32 | 65536]
> *
> The spiral test needs up to 67 108 864 bytes !*
> *
>
>
> To conclude, I already tuned initial capacities according to my
> benchmarks without consuming too much memory ~ 512K. However, I agree
> these capacities can be adjusted again depending on the workload or if
> you have any preference.
>
>
> 3/ Heap size:
>
> I have run again the test NO_INITIALS with only 512m heap:
>
> ==> marlin_NO_INITIALS_Xmx512m.log <==
> Threads    4    1    2    4
> Pct95    250.374    240.754    250.038    260.331
>
> ==> marlin_NO_INITIALS.log <==
> Threads    4    1    2    4
> Pct95    244.261    241.116    244.028    247.639
>
> So the weak cache has a bigger impact the smaller is the heap !
> Actually, adding more threads implies more renderer contexts with their
> caches that creates more garbage (weak).
>
> Typically the weak cache impacts small memory applications or web
> servers = many concurrent map requests !
>
> To conclude, the less garbage Marlin produces, the best performance it is.
>
> To be fair, I should also run again the reference test with 512m; but
> let's stop here for now.
>
>
> I hope these new results will give you an overview of the memory / array
> cache issue that Marlin has to deal with.
>
> Laurent
>