[OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)

Laurent Bourgès bourges.laurent at gmail.com
Tue Mar 26 11:00:26 UTC 2013


Dear all,

First I joined recently the openJDK contributors, and I plan to fix java2D
pisces code in my spare time.

I have a full time job on Aspro2: http://www.jmmc.fr/aspro; it is an
application to prepare astronomical observations at VLTI / CHARA and is
very used in our community (200 users): it provides scientific computations
(observability, model images using complex numbers ...) and zoomable plots
thanks to jFreeChart.

Aspro2 is known to be very efficient (computation parallelization) and I am
often doing profiling using netbeans profiler or visualVM.

To fix huge memory usages by java2d.pisces, I started implementing an
efficient ArrayCache (int[] and float[]) (in thread local to concurrency
problems):
- arrays in sizes between 10 and 10000 (more small arrays used than big
ones)
- resizing support (Arrays.copyOf) without wasting arrays
- reentrance i.e. many arrays are used at the same time (java2D Pisces
stroke / dash creates many segments to render)
- GC / Heap friendly ie support cache eviction and avoid consuming too much
memory

I know object pooling is known to be not efficient with recent VM (GC is
better) but I think it is counter productive to create so many int[] arrays
in java2d.pisces and let the GC remove such wasted memory.

Does someone have implemented such (open source) array cache (core-libs) ?
Opinions are welcome (but avoid "trolls").

Moreover, sun.java2d.pisces.Helpers.widenArray() performs a lot of array
resizing / copy (Arrays.copyOf) that I want to avoid mostly:
    // These use a hardcoded factor of 2 for increasing sizes. Perhaps this
    // should be provided as an argument.
    static float[] widenArray(float[] in, final int cursize, final int
numToAdd) {
        if (in.length >= cursize + numToAdd) {
            return in;
        }
        return Arrays.copyOf(in, 2 * (cursize + numToAdd));
    }

    static int[] widenArray(int[] in, final int cursize, final int
numToAdd) {
        if (in.length >= cursize + numToAdd) {
            return in;
        }
        return Arrays.copyOf(in, 2 * (cursize + numToAdd));
    }

Thanks to Peter Levart, I use its microbench tool (
https://github.com/plevart/micro-bench/tree/v2) to benchmark ArrayCache
operations... and J2DBench to test java2d performances

What is the fastest way to clear an array (part) ie fill by 0:
- public static void fill(int[] a, int fromIndex, int toIndex, int val)
- public static native void arraycopy(Object src,  int  srcPos, Object
dest, int destPos, int length);
- unsafe.setMemory(array, Unsafe.ARRAY_INT_BASE_OFFSET, 512 * SIZEOF_INT,
(byte) 0)

Apparently, Arrays.fill is always faster (size in 10 ... 10 000) !
I suspect hotspot to optimize its code and use native functions, isn't it
???

Benchmarks results:
>> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
Testing arrays: int[1]...
#
# ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      4,47 ns/op (σ =   0,00 ns/op) [     4,47]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      4,40 ns/op (σ =   0,00 ns/op) [     4,40]
# Measure:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      4,43 ns/op (σ =   0,00 ns/op) [     4,43]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           2 threads, Tavg =      5,55 ns/op (σ =   0,16 ns/op) [
5,40,      5,72]

#
# FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      6,47 ns/op (σ =   0,00 ns/op) [     6,47]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      6,21 ns/op (σ =   0,00 ns/op) [     6,21]
# Measure:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =      6,19 ns/op (σ =   0,00 ns/op) [     6,19]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =      7,80 ns/op (σ =   0,10 ns/op) [
7,90,      7,71]

#
# FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     26,82 ns/op (σ =   0,00 ns/op) [    26,82]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     23,48 ns/op (σ =   0,00 ns/op) [    23,48]
# Measure:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     22,42 ns/op (σ =   0,00 ns/op) [    22,42]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =     28,21 ns/op (σ =   0,88 ns/op) [
29,11,     27,36]

Testing arrays: int[100]...
#
# ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     16,49 ns/op (σ =   0,00 ns/op) [    16,49]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     15,97 ns/op (σ =   0,00 ns/op) [    15,97]
# Measure:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     16,03 ns/op (σ =   0,00 ns/op) [    16,03]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           2 threads, Tavg =     19,32 ns/op (σ =   0,46 ns/op) [
18,87,     19,80]

#
# FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     14,51 ns/op (σ =   0,00 ns/op) [    14,51]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     14,17 ns/op (σ =   0,00 ns/op) [    14,17]
# Measure:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     14,09 ns/op (σ =   0,00 ns/op) [    14,09]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =     31,15 ns/op (σ =   4,04 ns/op) [
27,65,     35,67]

#
# FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     52,32 ns/op (σ =   0,00 ns/op) [    52,32]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     52,82 ns/op (σ =   0,00 ns/op) [    52,82]
# Measure:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =     52,19 ns/op (σ =   0,00 ns/op) [    52,19]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =     70,87 ns/op (σ =   0,71 ns/op) [
70,17,     71,59]

Testing arrays: int[10000]...
#
# ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  1 208,64 ns/op (σ =   0,00 ns/op) [ 1 208,64]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  1 238,01 ns/op (σ =   0,00 ns/op) [ 1 238,01]
# Measure:
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  1 235,81 ns/op (σ =   0,00 ns/op) [ 1 235,81]
runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK
64-Bit Server VM 25.0-b22]
           2 threads, Tavg =  1 325,11 ns/op (σ =   7,01 ns/op) [
1 332,16,  1 318,14]

#
# FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  1 930,93 ns/op (σ =   0,00 ns/op) [ 1 930,93]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  2 060,80 ns/op (σ =   0,00 ns/op) [ 2 060,80]
# Measure:
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  2 105,21 ns/op (σ =   0,00 ns/op) [ 2 105,21]
runTest[class ArrayCacheBenchmark$FillArraySystemCopy] on JVM:
1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =  2 160,33 ns/op (σ =  13,74 ns/op) [
2 146,68,  2 174,15]

#
# FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
#
# Warm up:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  3 099,50 ns/op (σ =   0,00 ns/op) [ 3 099,50]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  3 041,81 ns/op (σ =   0,00 ns/op) [ 3 041,81]
# Measure:
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           1 threads, Tavg =  3 068,34 ns/op (σ =   0,00 ns/op) [ 3 068,34]
runTest[class ArrayCacheBenchmark$FillArrayUnsafe] on JVM: 1.8.0-internal
[OpenJDK 64-Bit Server VM 25.0-b22]
           2 threads, Tavg =  3 296,13 ns/op (σ =  34,97 ns/op) [
3 331,47,  3 261,53]


PS: java.awt.geom.Path2D has also memory allocation issues:
        void needRoom(boolean needMove, int newCoords) {
            if (needMove && numTypes == 0) {
                throw new IllegalPathStateException("missing initial moveto
"+
                                                    "in path definition");
            }
            int size = pointTypes.length;
            if (numTypes >= size) {
                int grow = size;
                if (grow > EXPAND_MAX) {
                    grow = EXPAND_MAX;
                }
                pointTypes = Arrays.copyOf(pointTypes, size+grow);
            }
            size = floatCoords.length;
            if (numCoords + newCoords > size) {
                int grow = size;
                if (grow > EXPAND_MAX * 2) {
                    grow = EXPAND_MAX * 2;
                }
                if (grow < newCoords) {
                    grow = newCoords;
                }
                floatCoords = Arrays.copyOf(floatCoords, size+grow);
            }
        }

Best regards,
Laurent



More information about the core-libs-dev mailing list