RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2]
Ioi Lam
iklam at openjdk.org
Wed Jul 5 21:09:55 UTC 2023
On Wed, 5 Jul 2023 16:43:18 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>> It's a bit disappointing for a PR aiming to make heap archiving GC agnostic, to make assumptions about GC internal memory layout, that doesn't apply to all collectors.
>> We have discussed previously with @iklam an approach where materializing archived objects uses the normal object allocation APIs. That would for real make the heap archiving mechanism GC agnostic. I would rather see that being prototyped, than a not GC agnostic approach that we might throw away right after it gets integrated, in favour of the more GC agnostic approach.
>
>> Based on discussion with @fisk, I created an RFE for investigating his idea. Please see https://bugs.openjdk.org/browse/JDK-8310823
>>
>> It seems quite promising to me, and will greatly reduce (or eliminate) the interface between CDS and the collectors.
>>
>> I would like to keep the current performance as possible, so I am leaning towards having an API for CDS to tell the collector about preferred location of the archived objects, to allow mmap'ing and reduce/avoid relocation. But such an hinting API will be much smaller than the ones proposed by this PR.
>>
>> (We'd also need an reverse API for the collector to tell CDS what the preferred address would be, during CDS dump time).
>
> I've done a very simple (and rough) implementation of @fisk 's idea, without any optimizations. Every relocation is done via a hashtable lookup. For the default CDS archive, this makes about 48000 relocations on start-up.
>
> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
>
>
> $ perf stat -r 40 java --version > /dev/null
> 0.015065 +- 0.000228 seconds time elapsed ( +- 1.51% )
>
> $ perf stat -r 40 java -XX:+NewArchiveHeapLoading --version > /dev/null
> 0.020598 +- 0.000215 seconds time elapsed ( +- 1.04% )
>
> $ perf stat -r 40 java -XX:+UseSerialGC --version > /dev/null
> 0.013929 +- 0.000229 seconds time elapsed ( +- 1.64% )
>
> $ perf stat -r 40 java -XX:+UseSerialGC -XX:+NewArchiveHeapLoading --version > /dev/null
> 0.019107 +- 0.000222 seconds time elapsed ( +- 1.16% )
>
>
> The cost of the individual object allocation and relocation is about 5ms.
>
> So far the slow down doesn't seem too outrages.
>
> My next step is to optimize the relocation code to see how much faster it can get.
>
> I noticed that the objects in `ArchiveHeapLoader::newcode_runtime_allocate_objects()` are allocated in at least two contiguous blocks, due to TLAB overflow.
> @iklam in your performance tests for "java -version" what is the "heap data relocation delta"? Is it non-zero? If so, can you also add the numbers for the runs with -Xmx128m which would correspond to the best case where no relocation is done just to add another data point.
I first ran `java -Xshare:dump` so all the subsequent `java --version` runs use the same heap size as dump time. As a result, my "before" runs had a heap relocation delta of zero, which should correspond to the best start-up time.
Anyway, I just wanted to get a rough baseline. I will do more detailed benchmarking after implementing the optimized relocation for the `-XX:+NewArchiveHeapLoading` case.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1622512296
More information about the hotspot-gc-dev
mailing list