RFR(M): 7188263: G1: Excessive c_heap (malloc) consumption

Srinivas Ramakrishna ysr1729 at gmail.com
Thu Sep 20 22:00:20 UTC 2012


Hi John --

Do you have numbers for CMS, to check if it's a potential problem there as
well?
(since much of the work-queue & marking stack allocation code is likely
similar if not
identical -- although some of the other structures are G1-specific.)

By the way, on the T4/Solaris box, did you check how crowded the VA space
became when you
got the ostensible OOM from malloc? I wonder whether the problem is not the
use of the malloc-heap
per-se (although not using malloc for such large, static strcutures that
live for the lifetime of the JVM
is definitely a move in the right direction) but the fact that Solaris'
lmalloc may be constraining itself
to a contiguous section of the VA space, unlike some other malloc's (such
as Linux's) which don't do that.
If that is the case, may be that's a potential conversation with the
Solaris libc/malloc folk.

I'll look at the changes in a bit.
-- ramki

On Thu, Sep 20, 2012 at 12:15 PM, John Cuthbertson <
john.cuthbertson at oracle.com> wrote:

> **
> Hi Everyone,
>
> Can I have a couple of volunteers review the changes for this CR - the
> webrev can be found at: http://cr.openjdk.java.net/~johnc/7188263/webrev.0?
>
> Summary:
> Compared to the other collectors, G1 consumes much more C heap (even
> during start up):
>
> ParallelGC (w/o ParallelOld):
>
> dr-evil{jcuthber}:210> ./test.sh -d64 -XX:-ZapUnusedHeapArea
> -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g
> -XX:+UseParallelGC -XX:+PrintMallocStatistics -XX:-UseParallelOldGC
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b20-internal-fastdebug,
> mixed mode)
> allocation stats: 3488 mallocs (12MB), 1161 frees (0MB), 4MB resrc
>
> ParallelGC (w/ ParallelOld):
>
>
> dr-evil{jcuthber}:211> ./test.sh -d64 -XX:-ZapUnusedHeapArea
> -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g
> -XX:+UseParallelGC -XX:+PrintMallocStatistics
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b20-internal-fastdebug,
> mixed mode)
> allocation stats: 3553 mallocs (36MB), 1160 frees (0MB), 4MB resrc
>
> G1:
>
> dr-evil{jcuthber}:212> ./test.sh -d64 -XX:-ZapUnusedHeapArea
> -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g -XX:+UseG1GC
> -XX:+PrintMallocStatistics
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b20-internal-fastdebug,
> mixed mode)
> allocation stats: 21703 mallocs (212MB), 1158 frees (0MB), 4MB resrc
>
> With the parallel collector, the main culprit is the work queues. For
> ParallelGC (without ParallelOIdGC) the amount of space allocated is around
> 1Mb per GC thread. For ParallelGC (with ParallelOldGC) this increases to
> around 3Mb per worker thread.  In G1, the main culprits are the global
> marking stack, the work queues (for both GC threads and marking threads),
> and some per-worker structures used for liveness accounting. This results
> in an additional 128Mb being allocated for the global marking stack and the
> amount allocated per-worker thread increases to around 7Mb. On some systems
> (specifically large T-series SPARC) this increase in C heap consumption can
> result in out-of-system-memory errors. These marking data structures are
> critical for G1. Reducing the sizes is a possible solution but increases
> the possibility of restarting marking due to overflowing the marking
> stack(s), lengthening marking durations, and increasing the chance of an
> evacuation failure and/or a Full GC.
>
> The solution we have adopted, therefore, is to allocate some of these
> marking data structures from virtual memory. This reduces the C heap
> consumption during start-up to:
>
> dr-evil{jcuthber}:216> ./test.sh -d64 -XX:-ZapUnusedHeapArea
> -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g -XX:+UseG1GC
> -XX:+PrintMallocStatistics
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b18-internal-fastdebug,
> mixed mode)
> allocation stats: 21682 mallocs (29MB), 1158 frees (0MB), 4MB resrc
>
> The memory is still allocated - just not from C heap. With these changes,
> G1's C heap consumption is now approximately 2Mb for each worker thread
> (which are the work queues themselves):
>
> C heap consumption (Mb) / # of GC threads
>
>   *Collector / # GC Threads
> * *1
> * *2
> * *3
> * *3
> * *5
> *  ParallelGC w/o ParallelOldGC
>  3
>  4
>  5
>  6
>  7
>   ParallelGC w/ ParallelOldGC
>  7
>  11
>  14
>  17
>  20
>   G1 before changes
>  149
>  156
>  163
>  170
>  177
>   G1 after changes
>  11
>  13
>  15
>  17
>  19
>
> We shall also investigate reducing the work queue sizes, to further reduce
> the amount of C heap consumed,  for some or all of the collectors - but in
> a separate CR.
>
> Testing:
> GC test suite on x64 and sparc T-series with a low marking threshold and
> marking verification enabled; jprt. Our reference workload was used to
> verify that there was no significant performance difference.
>
> Thanks,
>
> JohnC
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120920/70e9ee31/attachment.htm>


More information about the hotspot-gc-dev mailing list