RFR (S): G1: Use ArrayAllocator for BitMaps
Bengt Rutisson
bengt.rutisson at oracle.com
Thu Jun 13 07:53:32 PDT 2013
Hi everyone,
Could I have a couple of review for this small change?
http://cr.openjdk.java.net/~brutisso/8016556/webrev.00/
Sending this request to the broad hotspot dev mailing list since it
touches code in vm/utilities.
Background:
In the constructor for the ConcurrentMark class in G1 we set up one bit
map per worker thread:
for (uint i = 0; i < _max_worker_id; ++i) {
...
_count_card_bitmaps[i] = BitMap(card_bm_size, false);
...
}
Each of these bitmaps are malloced, which means that the amount of
C-heap we require grows with the number of GC worker threads we have. On
a machine with many CPUs we get many worker threads by default. For
example, on scaaa007 I get 79 GC worker threads. The size of the bitmaps
also scale with the heap size. Since this large machine has a lot of
memory we get a large default heap as well.
Here is the output from just running java -version with G1 on scaaa007:
$ java -d64 -XX:+UseG1GC -XX:+PrintMallocStatistics -version
java version "1.8.0-ea-fastdebug"
Java(TM) SE Runtime Environment (build 1.8.0-ea-fastdebug-b92)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b34-fastdebug, mixed mode)
allocation stats: 35199 mallocs (221MB), 13989 frees (0MB), 35MB resrc
We malloc 221MB just by starting the VM. Most of the large allocations
are due to the BitMap allocations. My patch changes the BitMap
allocations to use the ArrayAllocator instead. That class uses mmap on
Solaris if the size is larger than 64K.
With this patch the output looks like this:
$ java -d64 -XX:+UseG1GC -XX:+PrintMallocStatistics -version
java version "1.8.0-ea-fastdebug"
Java(TM) SE Runtime Environment (build 1.8.0-ea-fastdebug-b93)
Java HotSpot(TM) 64-Bit Server VM (build
25.0-b34-internal-201306130943.brutisso.hs-gc-g1-mmap-fastdebug, mixed mode)
allocation stats: 35217 mallocs (31MB), 14031 frees (0MB), 35MB resrc
We are down to 31MB.
Note that the ArrayAllocator only has this effect on Solaris machines.
Also note that I have not reduced the total amount of memory, just moved
it from the C-heap to mapped memory.
One complication with the fix is that the BitMap data structures get
copied around quite a bit. The copies are shallow copies, so we don't
risk re-doing the allocation. But since I am now embedding an
ArrayAllocator in the BitMap structure the destructor for copies of the
ArrayAllocator gets called every now and then. The BitMap explicitly
frees up the allocated memory when it thinks it is necessary. So, rather
than trying to refactor the code to avoid copying I made it optional to
free the allocated memory in the ArrayAllocator desctructor.
I do think it would be good to review the BitMap code. It seems a bit
fishy that we pass around shallow copies. But I think my current change
keeps the same behavior as before.
Thanks,
Bengt
More information about the hotspot-dev
mailing list