RFR(S) 8214388 CDS dumping fails with java heap fragmentation
Ioi Lam
ioi.lam at oracle.com
Fri Nov 30 01:07:42 UTC 2018
http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
https://bugs.openjdk.java.net/browse/JDK-8214388
Symptom:
========
"java -Xshare:dump" would intermittently fail with
Unable to write archive heap ... due to fragmentation.
This usually happens when you try to dump many classes (e.g. 10000) with
a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).
(Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)
When GC happens during class loading, some old G1 regions may be placed
at the top end of the heap (due to large number of GC threads).
Later, when writing the archived heap, G1 tries to allocate contiguous
regions from the top end of the heap. This would fail due to the presence
of those old regions.
Fix:
====
As suggested by Stefan Johansson, we run a full GC with a single GC
thread. This guarantees that all old blocks will be moved to the bottom
end of the heap.
Because there's no API for specifying the number of GC threads dynamically,
and CDS dump time doesn't allocates lots of objects, I have statically
forced
the number of threads to 1 in AdaptiveSizePolicy::calc_active_workers during
CDS dump time.
(This seems like a more direct way than assigning ParallelGCThreads ...)
Notes:
======
1. Humongous regions cannot move. However, currently we don't do humongous
allocations during CDS dump, so we should be fine. I have added
diagnostics
warnings so if fragmentation does happen in the future, the user can
find out why.
2. Fixed a minor bug in
HeapShared::check_closed_archive_heap_region_object_class
3. Fixed a bug in MetaspaceShared::read_extra_data, where the symbol/strings
would be lost due to GC.
4. Added stress test to successfully archive about 18MB of objects with
-Xmx64m.
This used to fail even with -Xmx512m on a Solaris box.
5. With default CDS archive generation during JDK build time, -Xmx128m
is used.
Before this fix, the EDEN region lives at the top of the heap during
CDS dump
time, and we end up with a 2MB gap between the archive regions and
the top
of the heap. Because the archive regions cannot move, at run time,
using CDS
would reduce the max humongous allocation by 2MB.
With this fix, the archive regions are now placed at the very top of
the heap,
so the gap no longer exists.
Tests:
======
Running hs-tiers{1-6} for sanity.
Thanks
- Ioi
More information about the hotspot-runtime-dev
mailing list