[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

Wed Apr 24 08:59:58 UTC 2013

Jim,

First, here are both updated webrev and benchmark results:
- results: http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_opt_night.log
- webrev: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-2/

Note: the webrev is partially "cleaner" - work still in progress !

Changes made:
- optimized cleanup of alpha / edges arrays
- TileState HARD reference stored in SunGraphics2D to avoid repeated
ThreadLocal or ConcurrentQueue accesses
- TileState propagated in RenderingEngine to PiscesRenderingEngine:
warning: interface compatibility issues
- minor tuning.

Now the ArrayCache (IntArrayCache, Dirty... and FloatArrayCache) are
totally useless during MapBench tests as the RendererContext stores large
arrays (16K int or float arrays) + rowAARLE (2Mb).
However, I keep the array caching for very high workload ... to be
discussed later.

Comparison (open office format):
http://jmmc.fr/~bourgesl/share/java2d-pisces/compareRef_Patch.ods

Patch2 vs ductus:
   1 *102,11%*  2 *144,49%*  4 *263,13%*
In average, patch2 is equal or better than ductus: 44% for 2 threads and
2.6 times for 4 threads !

In the following table, you can see gain variations depending on the test
(work load): my patch performs better than ductus for complex test case
only.

   test threads Tavg Tmed *Med+Stddev*  boulder_17 1 82,54% 77,68% *76,99%*
boulder_17 2 119,57% 120,24% *128,56%*  boulder_17 4 149,95% 150,39% *
161,98%*  shp_alllayers_47 1 107,26% 107,18% *107,02%*  shp_alllayers_47 2
144,24% 144,18% *147,00%*  shp_alllayers_47 4 288,05% 289,10% *286,04%*
Secondly, here are my comments:

2013/4/24 Jim Graham <james.graham at oracle.com>

>
> Originally the version that was used in embedded used RLE because it
> stored the results in the shape itself.  On desktop I never found that to
> be a necessary optimization especially because it actually wastes memory
> for no gain during animations, but that was why they used RLE as a storage
> format.  Would it speed up the code to use a different storage format?
>

Maybe it could be a very good idea: compressing alpha array to RLE and then
decompressing it to fill byte[] tile array seems a bad idea. However,
keeping RLE encoding may help having smaller arrays to store a complete
tile line as I want: width = 4096 (or more) x height = 32.

As memory is cheap nowadays, I could try having a large 1D array to store
alpha values for complete tile line: 512K only !

> Also, in the version we use in JavaFX we removed the tiling altogether and
> return one alpha array for the entire rasterization.  We might consider
> doing that for this code as well if it allows us to get rid of Ductus - it
> was a Ductus design constraint that forced the tiling (it was again based
> on the expected size of the hardware AA engine)...
>

I think tiling is still interesting as such small arrays stay in the cpu
cache ! however, I could try tuning the tile width to be larger (256x32)
instead of (32x32 tiles) ...

Finally,
Who could help me working on pisces ? Could we form a tiger team ?
or at least could denis and you have some time to help me ?

Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20130424/8c7b2a67/attachment.html>