RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2]
Thomas Stuefe
stuefe at openjdk.org
Thu Aug 10 08:17:03 UTC 2023
On Mon, 31 Jul 2023 20:29:46 GMT, Dan Heidinga <heidinga at openjdk.org> wrote:
>>> I hope to implement a fast path for relocation that avoids using the hash tables at all. If we can get the total alloc + reloc time to be about 1.5ms, then it would be just as fast as before when relocation is enabled.
>>
>> I've implemented a fast relocation lookup. It currently uses a table of the same size as the archived heap objects, but I can reduce that to 1/2 the size.
>>
>> See https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
>>
>> This is implemented by about 330 lines of code in archiveHeapLoader.cpp. The code is templatized to try out different approaches (like `-XX:+NahlRawAlloc` and `-XX:+NahlUseAccessAPI`), so it can be further simplified.
>>
>> There's only one thing that's not yet implemented -- the equivalent of `ArchiveHeapLoader::patch_native_pointers()`. I'll do that next.
>>
>>
>> $ java -XX:+NewArchiveHeapLoading -Xmx128m -Xlog:cds+gc --version
>> [0.004s][info][cds,gc] Delayed allocation records alloced: 640
>> [0.004s][info][cds,gc] Load Time: 1388458
>>
>>
>> The whole allocation + reloc takes about 1.4ms. It's about 1.25ms slower in the worst case (when the "old" code doesn't have to relocate -- see the `(**)` in the table below). It's 0.8ms slower when the "old" code has to relocate.
>>
>>
>> All times are in ms, for "java --version"
>>
>> ====================================
>> Dump: java -Xshare:dump -Xmx128m
>>
>> G1 old new diff
>> 128m 14.476 15.754 +1.277 (**)
>> 8192m 15.359 16.085 +0.726
>>
>>
>> Serial old new
>> 128m 13.442 14.241 +0.798
>> 8192m 13.740 14.532 +0.791
>>
>> ====================================
>> Dump: java -Xshare:dump -Xmx8192m
>>
>> G1 old new diff
>> 128m 14.975 15.787 +0.812
>> 2048m 16.239 17.035 +0.796
>> 8192m 14.821 16.042 +1.221 (**)
>>
>>
>> Serial old new
>> 128m 13.444 14.167 +0.723
>> 8192m 13.717 14.502 +0.785
>>
>>
>> While the code is slower than before, it's a lot simpler. It works on all collectors. I tested on ZGC, but I think Shenandoah should work as well.
>>
>> The cost is about 1.3 ms per MB of archived heap objects. This may be acceptable as it's a small fraction of JVM bootstrap. We have about 1MB of archived objects now, and we don't expect this size to drastically increase in the near future.
>>
>> The extra memory cost is:
>>
>> - a temporary in-memory copy of the archived heap o...
>
>> The cost is about 1.3 ms per MB of archived heap objects. This may be acceptable as it's a small fraction of JVM bootstrap. We have about 1MB of archived objects now, and we don't expect this size to drastically increase in the near future.
>
> Looking ahead to Project Leyden, I wouldn't be surprised if the current 1MB of archived heap became much larger. Demo apps on Graal Native Image are often ~4MB of image heap, and while it's not an apples-to-apples comparison, it suggests that somewhere between 5MB & 10MB isn't unreasonable for Leyden.
>
> Using 10MB as a baseline for easy math, 1.3ms/MB * 10MB = 13 ms for the new code? And (1.3ms-0.8ms) = 0.5ms/MB * 10MB = 5ms for the old code? Assuming I've interpreted the numbers correctly and importably that they scale linearly, it seems worth preserving the mmap approach for collectors that can support it.
>
> Does that seem reasonable? And justify preserving the mmap approach?
I share @DanHeidinga 's concern about scaling. Is it just me or is 1.3ms/MB rather high?
Even without mmap, do we have to allocate memory for each object individually? Could we not allocate a large block of memory for all objects, and copy them over linearly? Or allow the heap to overprovision, such that heaps with humongous limits can observe internal limits like that? E.g. Universe::heap()->mem_allocate_maybe_more(minsize, maxsize, &result_size).
One nice effect Ioi's approach has is that we could finally eliminate the requirement of "if you map archived heap objects, the narrow Klass encoding base has to be *exactly* the start of the shared metaspace region". Because when we copy the objects, we can now, at little additional cost, recompute the narrow Klass IDs too. That could allow us to use unscaled nKlass encoding at CDS runtime, which we could not do before.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1672747936
More information about the hotspot-gc-dev
mailing list