RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2]
Ashutosh Mehra
duke at openjdk.org
Mon Jul 10 14:55:07 UTC 2023
On Mon, 10 Jul 2023 05:35:53 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>>> > I first ran java -Xshare:dump so all the subsequent java --version runs use the same heap size as dump time. As a result, my "before" runs had a heap relocation delta of zero, which should correspond to the best start-up time.
>>>
>>> Okay, thanks for clarifying. I thought `java --version` runs were using the default archive.
>>
>> I haven't done any optimizations yet, but I fixed a few problems in the slow-path code.
>>
>> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
>>
>>
>> # Before: no relocation
>> $ perf stat -r 40 java --version > /dev/null
>> 0.015872 +- 0.000238 seconds time elapsed ( +- 1.50% )
>>
>> # Before: force relocation (quick)
>> $ perf stat -r 40 java -Xmx4g --version > /dev/null
>> 0.016691 +- 0.000385 seconds time elapsed ( +- 2.31% )
>>
>> # Before: force relocation ("quick relocation not possible")
>> $ perf stat -r 40 java -Xmx2g --version > /dev/null
>> 0.017385 +- 0.000230 seconds time elapsed ( +- 1.32% )
>>
>> # After
>> $ perf stat -r 40 java -XX:+NewArchiveHeapLoading --version > /dev/null
>> 0.018780 +- 0.000225 seconds time elapsed ( +- 1.20% )
>>
>>
>> So the slow path is just about 3ms slower than the fastest "before" case.
>>
>> Looking at the detailed timing break down (`os::thread_cpu_time()` = ns):
>>
>>
>> $ java -XX:+NewArchiveHeapLoading -Xlog:cds+gc --version
>> [0.006s][info][cds,gc] Num objs : 24184
>> [0.006s][info][cds,gc] Num bytes : 1074640
>> [0.006s][info][cds,gc] Per obj bytes : 44
>> [0.006s][info][cds,gc] Num references (incl nulls) : 87109
>> [0.006s][info][cds,gc] Num references relocated : 43225
>> [0.006s][info][cds,gc] Allocation Time : 1605084 <<<< A
>> [0.006s][info][cds,gc] Relocation Time : 1246894
>> [0.006s][info][cds,gc] Table(s) dispose Time : 1306
>>
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=2 -Xlog:cds+gc --version
>> [0.006s][info][cds,gc] Allocation Time : 2203781 <<<< B
>>
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=-1 -Xlog:cds+gc --version
>> [0.003s][info][cds,gc] Allocation Time : 282125 <<<< C
>>
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=0 -Xlog:cds+gc --version
>> [0.004s][inf...
>
>> I hope to implement a fast path for relocation that avoids using the hash tables at all. If we can get the total alloc + reloc time to be about 1.5ms, then it would be just as fast as before when relocation is enabled.
>
> I've implemented a fast relocation lookup. It currently uses a table of the same size as the archived heap objects, but I can reduce that to 1/2 the size.
>
> See https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
>
> This is implemented by about 330 lines of code in archiveHeapLoader.cpp. The code is templatized to try out different approaches (like `-XX:+NahlRawAlloc` and `-XX:+NahlUseAccessAPI`), so it can be further simplified.
>
> There's only one thing that's not yet implemented -- the equivalent of `ArchiveHeapLoader::patch_native_pointers()`. I'll do that next.
>
>
> $ java -XX:+NewArchiveHeapLoading -Xmx128m -Xlog:cds+gc --version
> [0.004s][info][cds,gc] Delayed allocation records alloced: 640
> [0.004s][info][cds,gc] Load Time: 1388458
>
>
> The whole allocation + reloc takes about 1.4ms. It's about 1.25ms slower in the worst case (when the "old" code doesn't have to relocate -- see the `(**)` in the table below). It's 0.8ms slower when the "old" code has to relocate.
>
>
> All times are in ms, for "java --version"
>
> ====================================
> Dump: java -Xshare:dump -Xmx128m
>
> G1 old new diff
> 128m 14.476 15.754 +1.277 (**)
> 8192m 15.359 16.085 +0.726
>
>
> Serial old new
> 128m 13.442 14.241 +0.798
> 8192m 13.740 14.532 +0.791
>
> ====================================
> Dump: java -Xshare:dump -Xmx8192m
>
> G1 old new diff
> 128m 14.975 15.787 +0.812
> 2048m 16.239 17.035 +0.796
> 8192m 14.821 16.042 +1.221 (**)
>
>
> Serial old new
> 128m 13.444 14.167 +0.723
> 8192m 13.717 14.502 +0.785
>
>
> While the code is slower than before, it's a lot simpler. It works on all collectors. I tested on ZGC, but I think Shenandoah should work as well.
>
> The cost is about 1.3 ms per MB of archived heap objects. This may be acceptable as it's a small fraction of JVM bootstrap. We have about 1MB of archived objects now, and we don't expect this size to drastically increase in the near future.
>
> The extra memory cost is:
>
> - a temporary in-memory copy of the archived heap objects
> - a temporary table of 1/2 the size of the archived heap objects
>
> The former can be reduced by readi...
@iklam can you please elaborate a bit on relocation optimizations being done by the patch. Without any background on the idea, it is difficult to infer it from the code.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1629132283
More information about the hotspot-gc-dev
mailing list