RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2]

Ashutosh Mehra duke at openjdk.org
Mon Jul 10 14:55:07 UTC 2023


On Mon, 10 Jul 2023 05:35:53 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>>> > I first ran java -Xshare:dump so all the subsequent java --version runs use the same heap size as dump time. As a result, my "before" runs had a heap relocation delta of zero, which should correspond to the best start-up time.
>>> 
>>> Okay, thanks for clarifying. I thought `java --version` runs were using the default archive.
>> 
>> I haven't done any optimizations yet, but I fixed a few problems in the slow-path code. 
>> 
>> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
>> 
>> 
>> # Before: no relocation
>> $ perf stat -r 40 java --version > /dev/null
>>           0.015872 +- 0.000238 seconds time elapsed  ( +-  1.50% )
>> 
>> # Before: force relocation (quick)
>> $ perf stat -r 40 java -Xmx4g --version > /dev/null
>>           0.016691 +- 0.000385 seconds time elapsed  ( +-  2.31% )
>> 
>> # Before: force relocation ("quick relocation not possible")
>> $ perf stat -r 40 java -Xmx2g --version > /dev/null
>>           0.017385 +- 0.000230 seconds time elapsed  ( +-  1.32% )
>> 
>> # After
>> $ perf stat -r 40 java -XX:+NewArchiveHeapLoading --version > /dev/null
>>           0.018780 +- 0.000225 seconds time elapsed  ( +-  1.20% )
>> 
>> 
>> So the slow path is just about 3ms slower than the fastest "before" case.
>> 
>> Looking at the detailed timing break down (`os::thread_cpu_time()` = ns):
>> 
>> 
>> $ java -XX:+NewArchiveHeapLoading -Xlog:cds+gc --version
>> [0.006s][info][cds,gc] Num objs                    :                24184
>> [0.006s][info][cds,gc] Num bytes                   :              1074640
>> [0.006s][info][cds,gc] Per obj bytes               :                   44
>> [0.006s][info][cds,gc] Num references (incl nulls) :                87109
>> [0.006s][info][cds,gc] Num references relocated    :                43225
>> [0.006s][info][cds,gc] Allocation Time             :              1605084 <<<< A
>> [0.006s][info][cds,gc] Relocation Time             :              1246894
>> [0.006s][info][cds,gc] Table(s) dispose Time       :                 1306
>> 
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=2 -Xlog:cds+gc --version
>> [0.006s][info][cds,gc] Allocation Time             :              2203781 <<<< B
>> 
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=-1 -Xlog:cds+gc --version
>> [0.003s][info][cds,gc] Allocation Time             :               282125 <<<< C
>> 
>> $ java -XX:+NewArchiveHeapLoading -XX:NewArchiveHeapNumAllocs=0 -Xlog:cds+gc --version
>> [0.004s][inf...
>
>> I hope to implement a fast path for relocation that avoids using the hash tables at all. If we can get the total alloc + reloc time to be about 1.5ms, then it would be just as fast as before when relocation is enabled.
> 
> I've implemented a fast relocation lookup. It currently uses a table of the same size as the archived heap objects, but I can reduce that to 1/2 the size.
> 
> See https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1
> 
> This is implemented by about 330 lines of code in archiveHeapLoader.cpp. The code is templatized to try out different approaches (like `-XX:+NahlRawAlloc` and `-XX:+NahlUseAccessAPI`), so it can be further simplified.
> 
> There's only one thing that's not yet implemented -- the equivalent of `ArchiveHeapLoader::patch_native_pointers()`. I'll do that next.
> 
> 
> $ java  -XX:+NewArchiveHeapLoading -Xmx128m -Xlog:cds+gc --version
> [0.004s][info][cds,gc] Delayed allocation records alloced: 640
> [0.004s][info][cds,gc] Load Time: 1388458
> 
> 
> The whole allocation + reloc takes about 1.4ms. It's about 1.25ms slower in the worst case (when the "old" code doesn't have to relocate -- see the `(**)` in the table below). It's 0.8ms slower when the "old" code has to relocate.
> 
> 
> All times are in ms, for "java --version"
> 
> ====================================
> Dump: java -Xshare:dump -Xmx128m
> 
> G1         old        new       diff
>  128m   14.476     15.754     +1.277 (**)
> 8192m   15.359     16.085     +0.726
> 
> 
> Serial     old        new
>  128m   13.442     14.241     +0.798
> 8192m   13.740     14.532     +0.791
> 
> ====================================
> Dump: java -Xshare:dump -Xmx8192m
> 
> G1         old        new       diff
>  128m   14.975     15.787     +0.812
> 2048m   16.239     17.035     +0.796
> 8192m   14.821     16.042     +1.221 (**)
> 
> 
> Serial     old        new
>  128m   13.444     14.167     +0.723
> 8192m   13.717     14.502     +0.785
> 
> 
> While the code is slower than before, it's a lot simpler. It works on all collectors. I tested on ZGC, but I think Shenandoah should work as well.
> 
> The cost is about 1.3 ms per MB of archived heap objects. This may be acceptable as it's a small fraction of JVM bootstrap. We have about 1MB of archived objects now,  and we don't expect this size to drastically increase in the near future.
> 
> The extra memory cost is:
> 
> - a temporary in-memory copy of the archived heap objects
> - a temporary table of 1/2 the size of the archived heap objects
> 
> The former can be reduced by readi...

@iklam can you please elaborate a bit on relocation optimizations being done by the patch. Without any background on the idea, it is difficult to infer it from the code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1629132283


More information about the hotspot-gc-dev mailing list