[OpenJDK 2D-Dev] AAShapePipe concurrency & memory waste

Tue Apr 9 17:44:43 UTC 2013

Hi Laurent,

The allocations will always show up on a heap profiler, I don't know of any way of having them not show up if they are stack allocated, but I don't think that stack allocation is the issue here - small allocations come out of a fast generation that costs almost nothing to allocate from and nearly nothing to clean up.  They are actually getting allocated and GC'd, but the process is optimized.

The only way to tell is to benchmark and see which changes make a difference and which are in the noise (or, in some odd counter-intuitive cases, counter-productive)...

			...jim

On 4/9/2013 10:34 AM, Laurent Bourgès wrote:
> Dear Jim,
>
> I advocated I only looked at the netbeans memory profiler's output: no more megabytes allocated !
>
> The main question is: how to know how GC / hotspot deals with such small allocations ? Is there any JVM flag to enable to see real allocations as does jmap -histo.
>
>
>     Quick questions - which benchmarks were run before/after?  I see a lot of benchmark running in your Pisces improvement thread, but but none here.
>
>
> Agreed; I can try running j2dBench on this fix only. I generally run Andrea's MapBench as I appeared more complex and using multiple threads.
>
>     Also, this should be tested on multiple platforms, preferably Linux, Windows and Mac to see how it is affected by differences in the platform runtimes and threading (hopefully minimal).
>
>
> It appears more difficult for me: I can use at work a mac 10.8 and I can run Windows XP within virtual box (but it is not very representative).
>
> Don't you have at oracle any test platform to perform such tests / benchmark ?
>
>     Finally, Hotspot is supposed to deal very well for small thread-local allocations like the int[4] and Rectangle2D that you optimized.  Was it necessary to cache those at all?  I'm sure the statistics for the allocations show up in a memory profile, but that doesn't mean it is costing us anything - ideally such small allocations are as fast as free and having to deal with caching them in a context will actually lose performance.  It may be that the tile caching saved enough that it might have masked unnecessary or detrimental changes for the smaller objects...
>
>
> I repeat my question: how can I know at runtime how hotspot optimizes AAShapePipe code (allocations ...) ? Does hotspot can do stack allocation ? is it explained somewhere (allocation size threshold) ?
>
> Maybe verbose:gc output may help ?
>
> Finally I spent a lot of time on pisces renderer and running MapBench to show performance gains.
>
> Thanks for your interesting feedback,
>
> Laurent
>
> On 4/5/2013 5:20 AM, Laurent Bourgčs wrote:
>
>     Dear java2d members,
>
>     I figured out some troubles in java2d.pipe.AAShapePipe related to both concurrency & memory usage:
>     - concurrency issue related to static theTile field: only 1 tile is cached so a new byte[] is created for other threads at each call to renderTile()
>     - excessive memory usage (byte[] for tile, int[] and rectangle): at each call to renderPath / renderTiles, several small objects are created (never cached) that leads to hundreds megabytes that GC must deal with
>
>     Here are profiling screenshots:
>     - 4 threads drawing on their own buffered image (MapBench test):
>     http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___byte_tile.png <http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_byte_tile.png>
>
>     - excessive int[] / Rectangle creation:
>     http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___int_bbox.png <http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_int_bbox.png>
>     http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___rectangle_bbox.png <http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_rectangle_bbox.png>
>
>     Here is the proposed patch:
>     http://jmmc.fr/~bourgesl/__share/AAShapePipe/webrev-1/ <http://jmmc.fr/~bourgesl/share/AAShapePipe/webrev-1/>
>
>     I applied a simple solution = use a ThreadLocal or ConcurrentLinkedQueue (see useThreadLocal flag) to cache one AAShapePipeContext per thread (2K max).
>     As its memory footprint is very small, I recommend using ThreadLocal.
>
>     Is it necessary to use Soft/Weak reference to avoid excessive memory usage for such cache ?
>
>     Is there any class dedicated to such cache (ThreadLocal with cache eviction or ConcurrentLinkedQueue using WeakReference ?) ?
>     I think it could be very useful at the JDK level to have such feature (ie a generic "GC friendly"cache )
>
>     Regards,
>     Laurent
>
>