[OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)

Jim Graham james.graham at oracle.com
Sat Mar 30 00:39:03 UTC 2013


Other thoughts - using chained buckets of edges instead of one single long list.  It would be easier to keep a pool of buckets (each holding, say, 256 edges?) than a "one-size-fits-all" pool of arrays.  Then all you have to do is keep high water marks on the number of simultaneously used buckets in order to tune the cache for a given application.

It would make the code that manages "pointers" to edges a little more complicated, though...

			...jim

On 3/29/2013 6:53 AM, Laurent Bourgès wrote:
> Phil,
>
> I agree it is a complex issue to improve memory usage while maintaining
> performance at the JDK level: applications can use java2d pisces in very
> different contexts: Swing app (client with only EDT thread), server-side
> application (multi thread headless) ...
>
> For the moment, I spent a lot of my time understanding the different
> classes in java2d.pisces and analyzing memory usage / performance ... using
> J2DBench (all graphics tests).
>
> In my Swing application, pisces produces a lot of waste (GC) but on server
> side, the GC overhead can be more important if several threads use pisces.
>
> Pisces uses memory differently:
> - fixed arrays (dasher, stroker)
> - dynamic arrays (edges ...) rowAARLE (very big one for big shapes)
>
> For the moment I am trying to avoid memory waste (pooling or kept
> reference) without any memory constraint (no eviction) but I agree it is an
> important aspect for server-side applications.
>
> To avoid concurrency issues, I use a ThreadLocal context named
> RendererContext to keep few temporary arrays (float6 and a BIG rowAARLE
> instance) but there is also dynamic IntArrayCache et FloatArrayCache which
> have several pools divided in buckets (256, 1024, 4096, 16384, 32768)
> containing only few instances.
>
> To have best performance, I studied pisces code to clear only the used
> array parts when recycling or using dirty arrays (only clear
> rowAARLE[...][1]).
>
> I think Andrea's proposal is interesting to maybe put some system
> properties to give hints (low memory footprint, use cache or not ...).
>
> 2013/3/28 Phil Race <philip.race at oracle.com>
>
>> Maintaining a pool of objects might be an appropriate thing for an
>> applications,
>> but its a lot trickier for the platform as the application's usage pattern
>> or intent
>> is largely unknown. Weak references or soft references might be of use but
>> weak references usually go away even at the next incremental GC and soft
>> references tend to not go away at all until you run out of heap.
>>
>
> Agreed; for the moment, pool eviction policy is not implemented but kept in
> mind.
> FYI: each RendererContext (per thread) has its own array pools (not shared)
> that could have different caching policies:
> For instance, AWT / EDT (repaint) could use a large cache although other
> threads do not use array caching at all.
>
>
>> You may well be right that always doubling the array size may be too
>> simplistic,
>> but it would need some analysis of the code and its usage to see how much
>> better we can do.
>
>
> There is two part:
> - initial array size for dynamic arrays: difficult to estimate but for now
> set to very low capacity (8 / 50 ...) to avoid memory waste for rectangle /
> line shapes. In my patch, I have defined MIN_ARRAY_SIZE = 128 (array pool)
> to avoid too much resizing as I am doing array recycling.
> - grow: I use x4 instead of x2 to avoid array copies.
>
> Laurent
>
>
>
> 2013/3/28 Phil Race <philip.race at oracle.com>
>
>> Maintaining a pool of objects might be an appropriate thing for an
>> applications,
>> but its a lot trickier for the platform as the application's usage pattern
>> or intent
>> is largely unknown. Weak references or soft references might be of use but
>> weak references usually go away even at the next incremental GC and soft
>> references tend to not go away at all until you run out of heap.
>>
>> You may well be right that always doubling the array size may be too
>> simplistic,
>> but it would need some analysis of the code and its usage to see how much
>> better we can do.
>>
>>
>>> Apparently, Arrays.fill is always faster (size in 10 ... 10 000) !
>>> I suspect hotspot to optimize its code and use native functions, isn't
>> it ???
>>
>> I suppose there is some hotspot magic involved to recognise and intrinsify
>> this
>> method, since the source code looks like a plain old for loop.
>>
>> -phil.
>>
>>
>>
>> On 3/26/2013 4:00 AM, Laurent Bourgès wrote:
>>
>>> Dear all,
>>>
>>> First I joined recently the openJDK contributors, and I plan to fix
>>> java2D pisces code in my spare time.
>>>
>>> I have a full time job on Aspro2: http://www.jmmc.fr/aspro; it is an
>>> application to prepare astronomical observations at VLTI / CHARA and is
>>> very used in our community (200 users): it provides scientific computations
>>> (observability, model images using complex numbers ...) and zoomable plots
>>> thanks to jFreeChart.
>>>
>>> Aspro2 is known to be very efficient (computation parallelization) and I
>>> am often doing profiling using netbeans profiler or visualVM.
>>>
>>> To fix huge memory usages by java2d.pisces, I started implementing an
>>> efficient ArrayCache (int[] and float[]) (in thread local to concurrency
>>> problems):
>>> - arrays in sizes between 10 and 10000 (more small arrays used than big
>>> ones)
>>> - resizing support (Arrays.copyOf) without wasting arrays
>>> - reentrance i.e. many arrays are used at the same time (java2D Pisces
>>> stroke / dash creates many segments to render)
>>> - GC / Heap friendly ie support cache eviction and avoid consuming too
>>> much memory
>>>
>>> I know object pooling is known to be not efficient with recent VM (GC is
>>> better) but I think it is counter productive to create so many int[] arrays
>>> in java2d.pisces and let the GC remove such wasted memory.
>>>
>>> Does someone have implemented such (open source) array cache (core-libs) ?
>>> Opinions are welcome (but avoid "trolls").
>>>
>>> Moreover, sun.java2d.pisces.Helpers.**widenArray() performs a lot of
>>> array resizing / copy (Arrays.copyOf) that I want to avoid mostly:
>>>      // These use a hardcoded factor of 2 for increasing sizes. Perhaps
>>> this
>>>      // should be provided as an argument.
>>>      static float[] widenArray(float[] in, final int cursize, final int
>>> numToAdd) {
>>>          if (in.length >= cursize + numToAdd) {
>>>              return in;
>>>          }
>>>          return Arrays.copyOf(in, 2 * (cursize + numToAdd));
>>>      }
>>>
>>>      static int[] widenArray(int[] in, final int cursize, final int
>>> numToAdd) {
>>>          if (in.length >= cursize + numToAdd) {
>>>              return in;
>>>          }
>>>          return Arrays.copyOf(in, 2 * (cursize + numToAdd));
>>>      }
>>>
>>> Thanks to Peter Levart, I use its microbench tool (
>>> https://github.com/plevart/**micro-bench/tree/v2<https://github.com/plevart/micro-bench/tree/v2>)
>>> to benchmark ArrayCache operations... and J2DBench to test java2d
>>> performances
>>>
>>> What is the fastest way to clear an array (part) ie fill by 0:
>>> - public static void fill(int[] a, int fromIndex, int toIndex, int val)
>>> - public static native void arraycopy(Object src,  int  srcPos, Object
>>> dest, int destPos, int length);
>>> - unsafe.setMemory(array, Unsafe.ARRAY_INT_BASE_OFFSET, 512 * SIZEOF_INT,
>>> (byte) 0)
>>>
>>> Apparently, Arrays.fill is always faster (size in 10 ... 10 000) !
>>> I suspect hotspot to optimize its code and use native functions, isn't it
>>> ???
>>>
>>> Benchmarks results:
>>>>> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>> Testing arrays: int[1]...
>>> #
>>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      4,47 ns/op (σ =   0,00 ns/op) [
>>> 4,47]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      4,40 ns/op (σ =   0,00 ns/op) [
>>> 4,40]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      4,43 ns/op (σ =   0,00 ns/op) [
>>> 4,43]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =      5,55 ns/op (σ =   0,16 ns/op) [
>>> 5,40,      5,72]
>>>
>>> #
>>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      6,47 ns/op (σ =   0,00 ns/op) [
>>> 6,47]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      6,21 ns/op (σ =   0,00 ns/op) [
>>> 6,21]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =      6,19 ns/op (σ =   0,00 ns/op) [
>>> 6,19]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =      7,80 ns/op (σ =   0,10 ns/op) [
>>> 7,90,      7,71]
>>>
>>> #
>>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     26,82 ns/op (σ =   0,00 ns/op) [
>>>   26,82]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     23,48 ns/op (σ =   0,00 ns/op) [
>>>   23,48]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     22,42 ns/op (σ =   0,00 ns/op) [
>>>   22,42]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =     28,21 ns/op (σ =   0,88 ns/op) [
>>>   29,11,     27,36]
>>>
>>> Testing arrays: int[100]...
>>> #
>>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     16,49 ns/op (σ =   0,00 ns/op) [
>>>   16,49]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     15,97 ns/op (σ =   0,00 ns/op) [
>>>   15,97]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     16,03 ns/op (σ =   0,00 ns/op) [
>>>   16,03]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =     19,32 ns/op (σ =   0,46 ns/op) [
>>>   18,87,     19,80]
>>>
>>> #
>>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     14,51 ns/op (σ =   0,00 ns/op) [
>>>   14,51]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     14,17 ns/op (σ =   0,00 ns/op) [
>>>   14,17]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     14,09 ns/op (σ =   0,00 ns/op) [
>>>   14,09]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =     31,15 ns/op (σ =   4,04 ns/op) [
>>>   27,65,     35,67]
>>>
>>> #
>>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     52,32 ns/op (σ =   0,00 ns/op) [
>>>   52,32]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     52,82 ns/op (σ =   0,00 ns/op) [
>>>   52,82]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =     52,19 ns/op (σ =   0,00 ns/op) [
>>>   52,19]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =     70,87 ns/op (σ =   0,71 ns/op) [
>>>   70,17,     71,59]
>>>
>>> Testing arrays: int[10000]...
>>> #
>>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  1 208,64 ns/op (σ =   0,00 ns/op) [ 1
>>> 208,64]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  1 238,01 ns/op (σ =   0,00 ns/op) [ 1
>>> 238,01]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  1 235,81 ns/op (σ =   0,00 ns/op) [ 1
>>> 235,81]
>>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =  1 325,11 ns/op (σ =   7,01 ns/op) [ 1
>>> 332,16,  1 318,14]
>>>
>>> #
>>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  1 930,93 ns/op (σ =   0,00 ns/op) [ 1
>>> 930,93]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  2 060,80 ns/op (σ =   0,00 ns/op) [ 2
>>> 060,80]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  2 105,21 ns/op (σ =   0,00 ns/op) [ 2
>>> 105,21]
>>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =  2 160,33 ns/op (σ =  13,74 ns/op) [ 2
>>> 146,68,  2 174,15]
>>>
>>> #
>>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>>> #
>>> # Warm up:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  3 099,50 ns/op (σ =   0,00 ns/op) [ 3
>>> 099,50]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  3 041,81 ns/op (σ =   0,00 ns/op) [ 3
>>> 041,81]
>>> # Measure:
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             1 threads, Tavg =  3 068,34 ns/op (σ =   0,00 ns/op) [ 3
>>> 068,34]
>>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>>             2 threads, Tavg =  3 296,13 ns/op (σ =  34,97 ns/op) [ 3
>>> 331,47,  3 261,53]
>>>
>>>
>>> PS: java.awt.geom.Path2D has also memory allocation issues:
>>>          void needRoom(boolean needMove, int newCoords) {
>>>              if (needMove && numTypes == 0) {
>>> throw new IllegalPathStateException("**missing initial moveto "+
>>> "in path definition");
>>>              }
>>>              int size = pointTypes.length;
>>>              if (numTypes >= size) {
>>> int grow = size;
>>>                  if (grow > EXPAND_MAX) {
>>> grow = EXPAND_MAX;
>>>                  }
>>> pointTypes = Arrays.copyOf(pointTypes, size+grow);
>>>              }
>>>              size = floatCoords.length;
>>>              if (numCoords + newCoords > size) {
>>> int grow = size;
>>>                  if (grow > EXPAND_MAX * 2) {
>>> grow = EXPAND_MAX * 2;
>>>                  }
>>>                  if (grow < newCoords) {
>>> grow = newCoords;
>>>                  }
>>> floatCoords = Arrays.copyOf(floatCoords, size+grow);
>>>              }
>>>          }
>>>
>>> Best regards,
>>> Laurent
>>>
>>
>>



More information about the 2d-dev mailing list