RFR(M): 7188263: G1: Excessive c_heap (malloc) consumption

Thu Sep 27 22:06:02 UTC 2012

Hi Bengt,

Thanks for the response. Replies are inline...

On 09/27/12 12:28, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for the extra explanations. They helped a lot! And I think your 
> suggestions, for addressing the comments I had, sounds good. In 
> particular it makes sense to treat the task queue size the same way in 
> CMS and G1.
The main difference was that CMS was using virtual space for it's 
marking stack while G1 was using C heap. So the G1 code now mirrors that 
of CMS.

>
> I'll look at your updated webrev when you send it out.
>
> Regarding whether or not to only use VirtualSpaces on 64 bit I feel a 
> bit unsure. Using VirtualSpaces already make the code more complex 
> than it was before with C heap allocations. Introducing platform 
> dependencies on top of that seems to create a big mess. If it somehow 
> is possible to abstract the allocation away so we keep it in one place 
> it might be worth treating 32 and 64 bit differently.
>
> Not sure if this is a good thought; but if we would make it optional 
> to use VirtualSpaces or CHeap to support 32/64 bit differences, would 
> it make sense to only use VirtualSpaces on Solaris? You mentioned that 
> the out-of-C-heap issue seem to happen due to a Solaris bug.
I think we should use a virtual space for the marking stack (like CMS 
does) on all platforms.

For the card bitmaps etc it might look OK if we're prepared to have 
conditional compilation in the code. Then I could have a very simple 
allocator/utility class to manage the backing store that doesn't care if 
the underlying space is C heap or virtual space. The conditional code 
would be "hidden" in the simple allocator. On platforms other than 
Solaris it would be a wrapper around malloc; on Solaris we would 
allocate from the virtual space using a simple pointer bump. How does 
that sound?

If we decide to go that route I may want to break up the work. One CR 
for the mark stack and the current CR for the card bit maps with the 
simple allocator.

BTW I instrumented the G1CollectedHeap constructor to determine where 
the bulk of the allocation requests were coming from:

cairnapple{jcuthber}:272> ./test.sh -d64 -XX:-ZapUnusedHeapArea 
-XX:CICompilerCount=1 -XX:ParallelGCThreads=1 -Xms2g -Xmx2g -XX:+UseG1GC 
-XX:+PrintMallocStatistics
Using java runtime at: 
/net/jre.us.oracle.com/p/v42/jdk/7/fcs/b147/binaries/solaris-x64/jre
0 mallocs (0MB), 0 frees (0MB), 0MB resrc
90 mallocs (1MB), 0 frees (0MB), 0MB resrc
14432 mallocs (4MB), 0 frees (0MB), 0MB resrc
14556 mallocs (4MB), 2 frees (0MB), 0MB resrc
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b02-internal-fastdebug, 
mixed mode)
allocation stats: 19599 mallocs (7MB), 1119 frees (0MB), 3MB resrc

The first is from the executable statement in the G1CollectorHeap 
constructor. The second from just after initializing the ConcurrentMark 
instance. The third is just after allocating the initial heap. The last 
is at the end of the G1CollectedHeap constructor.

The bulk of the malloc requests are coming from allocating the heap 
region instances (and some of their contained structures).

If the simple allocator idea works out then we could perhaps up level it 
to the G1CollectedHeap and and use it to allocate the heap region instances.

Regards,

JohnC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120927/a6300b6e/attachment.htm>