RFR: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy [v2]
Roman Kennke
rkennke at openjdk.org
Wed Aug 16 01:03:11 UTC 2023
On Wed, 26 Jul 2023 04:28:25 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>> @iklam I agree this is a much better approach and makes the whole process truly collector agnostic. Great work, specially the optimization to re-order the objects.
>>
>> Given that this has minimal impact on performance, are we good to go ahead with this approach now?
>>
>> One issue I noticed while doing some testing with Shenandoah collector is probably worth pointing out here:
>> When using `-XX:+NahlRawAlloc` with very small heap size like -Xmx4m or -Xmx8m the java process freezes. . This happens because the allocations for archive objects causes pacing to kick in and the main thread waits on `ShenandoahPacer::_wait_monitor` [0] to be notified by ShenandoahPeriodicPacerNotify [1]. But the WatcherThread which is responsible for executing the `ShenandoahPeriodicPacerNotify` task does run the periodic tasks until VM init is done [2][3]. So the main thread is stuck now.
>>
>> I guess if we do the wait in `ShenandoahPacer::pace_for_alloc` only after VM init is completed, it would resolve this issue.
>>
>> I haven't noticed this with `-XX:-NahlRawAlloc`, not sure why that should make any difference.
>>
>> Here are the stack traces:
>>
>> main thread:
>>
>> #5 0x00007f5a1fafbafc in PlatformMonitor::wait (this=this at entry=0x7f5a180f6c78, millis=<optimized out>, millis at entry=10) at src/hotspot/os/posix/mutex_posix.hpp:124
>> #6 0x00007f5a1faa3f9c in Monitor::wait (this=0x7f5a180f6c70, timeout=timeout at entry=10) at src/hotspot/share/runtime/mutex.cpp:254
>> #7 0x00007f5a1fc2d3bd in ShenandoahPacer::wait (time_ms=10, this=0x7f5a180f6a20) at src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp:286
>> #8 ShenandoahPacer::pace_for_alloc (this=0x7f5a180f6a20, words=<optimized out>) at src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp:263
>> #9 0x00007f5a1fbfc7e1 in ShenandoahHeap::allocate_memory (this=0x7f5a180ca590, req=...) at src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:855
>> #10 0x00007f5a1fbfcb5c in ShenandoahHeap::mem_allocate (this=<optimized out>, size=<optimized out>, gc_overhead_limit_was_exceeded=<optimized out>) at src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp:931
>> #11 0x00007f5a1f2402c2 in NewQuickLoader::mem_allocate_raw (size=6) at src/hotspot/share/cds/archiveHeapLoader.cpp:493
>> #12 NewQuickLoaderImpl<true, true>::allocate (this=<optimized out>, __the_thread__=<optimized out>, size=<synthetic pointer>: 6, stream=0x7f5a1d228850) at src/hotspot/share/cds/archiveHeapLoader.cpp:712
>> #13 NewQuickLoaderImpl<true, true>::load_archive_heap_inner<fals...
>
> Hi @ashu-mehra thanks for testing the patch. I think we all agree that the minor performance impact is acceptable because the code is simpler and more portable. I'll try to clean up my patch and start a PR.
>
> BTW, I have implemented a simpler relocation algorithm with similar performance. It uses less memory and hopefully will be easier to understand as well. The algorithm is described in comments inside archiveHeapLoader.cpp
>
> https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc.alternative-relocation?expand=1
>
> As a prerequisite, I'll start a PR for [JDK-8251330: Reorder CDS archived heap to speed up relocation](https://bugs.openjdk.org/browse/JDK-8251330)
>
> Regarding raw allocation, it doesn't seem to be too much faster, so maybe we should disable it, at least for the initial integration.
>
>
> $ (for i in {1..6}; do perf stat -r 100 java -XX:+NewArchiveHeapLoading -XX:-NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed
> 0.0162332 +- 0.0000914 seconds time elapsed ( +- 0.56% )
> 0.0161316 +- 0.0000228 seconds time elapsed ( +- 0.14% )
> 0.0161171 +- 0.0000250 seconds time elapsed ( +- 0.16% )
> 0.0161311 +- 0.0000231 seconds time elapsed ( +- 0.14% )
> 0.0161433 +- 0.0000244 seconds time elapsed ( +- 0.15% )
> 0.0161121 +- 0.0000271 seconds time elapsed ( +- 0.17% )
> $ (for i in {1..6}; do perf stat -r 100 java -XX:+NewArchiveHeapLoading -XX:+NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed
> 0.0160640 +- 0.0000973 seconds time elapsed ( +- 0.61% )
> 0.0159320 +- 0.0000221 seconds time elapsed ( +- 0.14% )
> 0.0159910 +- 0.0000272 seconds time elapsed ( +- 0.17% )
> 0.0159406 +- 0.0000230 seconds time elapsed ( +- 0.14% )
> 0.0159930 +- 0.0000252 seconds time elapsed ( +- 0.16% )
> 0.0159670 +- 0.0000296 seconds time elapsed ( +- 0.19% )
> $ (for i in {1..6}; do perf stat -r 100 java -XX:-NewArchiveHeapLoading -XX:+NahlRawAlloc --version > /dev/null; done) 2>&1 | grep elapsed
> 0.0149069 +- 0.0000932 seconds time elapsed ( +- 0.63% )
> 0.0148363 +- 0.0000259 seconds time elapsed ( +- 0.17% )
> 0.0148077 +- 0.0000218 seconds time elapsed ( +- 0.15% )
> 0.0148377 +- 0.0000212 seconds time elapsed ( +- 0.14% )
> 0.0148411 +- 0.0000245 seconds time elapsed ( +- 0.17% )
> 0.0148504 +- 0.0000258 seconds time elapsed ( +- 0.17% )
> @iklam I agree this is a much better approach and makes the whole process truly collector agnostic. Great work, specially the optimization to re-order the objects.
>
> Given that this has minimal impact on performance, are we good to go ahead with this approach now?
>
> One issue I noticed while doing some testing with Shenandoah collector is probably worth pointing out here: When using `-XX:+NahlRawAlloc` with very small heap size like -Xmx4m or -Xmx8m the java process freezes. . This happens because the allocations for archive objects causes pacing to kick in and the main thread waits on `ShenandoahPacer::_wait_monitor` [0] to be notified by ShenandoahPeriodicPacerNotify [1]. But the WatcherThread which is responsible for executing the `ShenandoahPeriodicPacerNotify` task does run the periodic tasks until VM init is done [2][3]. So the main thread is stuck now.
>
> I guess if we do the wait in `ShenandoahPacer::pace_for_alloc` only after VM init is completed, it would resolve this issue.
Yes, Shenandoah pacing should not run before VM init is complete. This seems like a separate issue.
Great work in this PR, though!
Roman
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14520#issuecomment-1679817438
More information about the hotspot-gc-dev
mailing list