[OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)
Laurent Bourgès
bourges.laurent at gmail.com
Thu Apr 4 13:44:48 UTC 2013
I updated both patched pisces code and benchmarks:
http://jmmc.fr/~bourgesl/share/java2d-pisces/
Few results comparing ThreadLocal vs ConcurrentLinkedQueue usage:
OpenJDK 8 PATCH ThreadLocal mode:
Testing file
/home/bourgesl/libs/openjdk/mapbench/test/dc_boulder_2013-13-30-06-13-17.ser
1 threads and 20 loops per thread, time: 2671 ms
2 threads and 20 loops per thread, time: 3239 ms
4 threads and 20 loops per thread, time: 6043 ms
OpenJDK 8 PATCH ConcurrentLinkedQueue mode:
Testing file
/home/bourgesl/libs/openjdk/mapbench/test/dc_boulder_2013-13-30-06-13-17.ser
1 threads and 20 loops per thread, time: 2779 ms
2 threads and 20 loops per thread, time: 3416 ms
4 threads and 20 loops per thread, time: 6153 ms
Oracle JDK8 Ductus:
Testing file
/home/bourgesl/libs/openjdk/mapbench/dc_boulder_2013-13-30-06-13-17.ser
1 threads and 20 loops per thread, time: 1894 ms
2 threads and 20 loops per thread, time: 3905 ms
4 threads and 20 loops per thread, time: 7485 ms
OpenJDK 8 PATCH ThreadLocal mode:
Testing file
/home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser
1 threads and 20 loops per thread, time: 24211 ms
2 threads and 20 loops per thread, time: 30955 ms
*4 threads and 20 loops per thread, time: 67715 ms*
OpenJDK 8 PATCH ConcurrentLinkedQueue mode:
Testing file
/home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser
1 threads and 20 loops per thread, time: 25984 ms
2 threads and 20 loops per thread, time: 33131 ms
*4 threads and 20 loops per thread, time: 75343 ms
*
Oracle JDK8 Ductus:
Loading drawing commands from file:
/home/bourgesl/libs/openjdk/mapbench/dc_shp_alllayers_2013-00-30-07-00-47.ser
Loaded DrawingCommands: DrawingCommands{width=1400, height=800,
commands=135213}
1 threads and 20 loops per thread, time: 20911 ms
2 threads and 20 loops per thread, time: 39297 ms
4 threads and 20 loops per thread, time: 103392 ms
ConcurrentLinkedQueue add a small overhead but not too much vs ThreadLocal.
Is it possible to test efficiently if the current thread is EDT then I
could use ThreadLocal for EDT at least ? it must be very fast because
getThreadContext() is called once per rendering operation so it is a
performance bottleneck.
For example:
Testing file
/home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser
TL: 4 threads and 20 loops per thread, time: 67715 ms
CLQ: 4 threads and 20 loops per thread, time: 75343 ms
Changes:
- use ThreadLocal or ConcurrentLinkedQueue<RendererContext> to get a
renderer context (vars / cache)
- use first RendererContext (dirty / clean arrays) members instead of using
IntArrayCache / FloatArrayCache for performance reasons (dedicated to large
dynamic arrays)
TBD:
- recycle pisces class i.e. keep only one instance per class (Renderer,
Stroker ...) to avoid totally GC overhead (several thousands per MapBench
test).
Moreover, these are very small objects / short lived i.e. l so it should
stay in ThreadLocalAllocator (TLAB) but when I use verbose:gc or jmap
-histo these are present and represents megabytes:
[bourgesl at jmmc-laurent ~]$ jmap -histo:live 21628 | grep pisces
5: 50553 6470784 sun.java2d.pisces.Renderer
9: 29820 3578400 sun.java2d.pisces.Stroker
11: 49795 3186880 sun.java2d.pisces.PiscesCache
12: 49794 1991760 sun.java2d.pisces.PiscesTileGenerator
13: 49793 1991720
sun.java2d.pisces.Renderer$ScanlineIterator
14: 29820 1431360
sun.java2d.pisces.PiscesRenderingEngine$NormalizingPathIterator
52: 40 1280 sun.java2d.pisces.IntArrayCache
94: 20 640 sun.java2d.pisces.FloatArrayCache
121: 8 320 [Lsun.java2d.pisces.IntArrayCache;
127: 4 320 sun.java2d.pisces.RendererContext
134: 4 256 sun.java2d.pisces.Curve
154: 4 160 [Lsun.java2d.pisces.FloatArrayCache;
155: 4 160
sun.java2d.pisces.RendererContext$RendererData
156: 4 160
sun.java2d.pisces.RendererContext$StrokerData
157: 4 160 sun.java2d.pisces.Stroker$PolyStack
208: 3 72
sun.java2d.pisces.PiscesRenderingEngine$NormMode
256: 1 32
[Lsun.java2d.pisces.PiscesRenderingEngine$NormMode;
375: 1 16 sun.java2d.pisces.PiscesRenderingEngine
376: 1 16 sun.java2d.pisces.RendererContext$1
Regards,
Laurent
2013/4/3 Laurent Bourgès <bourges.laurent at gmail.com>
> Thanks for your valueable feedback!
>
> Here is the current status of my patch alpha version:
>>> http://jmmc.fr/~bourgesl/share/java2d-pisces/
>>>
>>> There is still a lot to be done: clean-up, stats, pisces class instance
>>> recycling (renderer, stroker ...) and of course sizing correctly initial
>>> arrays (dirty or clean) in the RendererContext (thread local storage).
>>> For performance reasons, I am using now RendererContext members first
>>> (cache for rowAARLE for example) before using ArrayCaches (dynamic arrays).
>>>
>>
>> Thank you Laurent, those are some nice speedups.
>>
> I think it can still be improved: I hope to make it as fast as ductus or
> maybe more (I have several idea for aggressive optimizations) but the main
> improvement consist in reusing memory (like C / C++ does) to avoid wasted
> memory / GC overhead in concurrent environment.
>
>
>> About the thread local storage, that is a sensible choice for highly
>> concurrent systems, at the same time, web containers normally complain about
>> orphaned thread locals created by an application and not cleaned up.
>> Not sure if ones created at the core libs level get special treatment,
>> but in general, I guess it would be nice to have some way to clean them up.
>>
>
> You're right that's why my patch is not ready !
>
> I chose ThreadLocal for simplicity and clarity but I see several issues:
> 1/ Web container: ThreadLocal must be clean up when stopping an
> application to avoid memory leaks (application becomes unloadable due to
> classloader leaks)
> 2/ ThreadLocal access is the fastest way to get the RendererContext as it
> does not require any lock (unsynchronized); As I get the RendererContext
> once per rendering request, I think the ThreadLocal can be replaced by a
> thread-safe ConcurrentLinkedQueue<RendererContext> but it may become a
> performance bootleneck
> 3/ Using a ConcurrentLinkedQueue<RendererContext> requires an efficient /
> proper cache eviction to free memory (Weak or Soft references ?) or using
> statistics (last usage timestamp, usage counts)
>
> Any other idea (core-libs) to have an efficient thread context in a web
> container ?
>
> I'm not familiar with the API, but is there any way to clean them up when
>> the graphics2d gets disposed of?
>>
>
> The RenderingEngine is instanciated by the JVM once and I do not see in
> the RenderingEngine interface any way to perform callbacks for warmup /
> cleanup ... nor access to the Graphics RenderingHints (other RFE for tuning
> purposes)
>
>
>> A web application has no guarantee to see the same thread ever again
>> during his life, so thread locals have to be cleaned right away.
>>
>
> I advocate ThreadLocal can lead to wasted memory as only few concurrent
> threads can really use their RendererContext instance while others can
> simply answer web requests => let's use a
> ConcurrentLinkedQueue<RendererContext> with a proper cache eviction.
>
>
>>
>> Either that, or see if there is any way to store the array caches in a
>> global structure backed by a concurrent collection to reduce/eliminate
>> contention.
>>
>
> Yes, it is a interesting alternative to benchmark.
>
> Regards,
> Laurent
>
More information about the core-libs-dev
mailing list