RFR: 8288129: Shenandoah: Skynet test crashed with iu + aggressive [v2]
Ashutosh Mehra
duke at openjdk.org
Wed Sep 7 15:32:46 UTC 2022
On Fri, 26 Aug 2022 20:22:57 GMT, Ashutosh Mehra <duke at openjdk.org> wrote:
>> Please review this patch to fix the crash when running Loom with Shenandoah in iu+aggressive mode.
>>
>> When running with `-XX:+ShenandoahVerify`, the issue manifests as assertion at:
>>
>>
>> 1 #
>> 2 # A fatal error has been detected by the Java Runtime Environment:
>> 3 #
>> 4 # Internal Error (/home/asmehra/data/ashu-mehra/loom/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:92), pid=920144, tid=920156
>> 5 # Error: Before Evacuation, Reachable; Must be marked in complete bitmap, except j.l.r.Reference referents
>> 6
>> 7 Referenced from:
>> 8 interior location: 0x00007f0a40000090
>> 9 0x00007f0a40000060 - klass 0x00007f0a0d89a290 jdk.internal.vm.StackChunk
>> 10 allocated after mark start
>> 11 not after update watermark
>> 12 marked strong
>> 13 marked weak
>> 14 not in collection set
>> 15 mark: mark(is_neutral no_hash age=0)
>> 16 region: | 0|R |BTE 7f0a40000000, 7f0a401ff9b0, 7f0a40200000|TAMS 7f0a40000000|UWM 7f0a401ff9b0|U 2046K|T 2046K|G 0B|S 0B|L 2046K|CP 0
>> 17
>> 18 Object:
>> 19 0x00007f0c2f848a40 - klass 0x00007f0a0d89a290 jdk.internal.vm.StackChunk
>> 20 not allocated after mark start
>> 21 not after update watermark
>> 22 not marked strong
>> 23 not marked weak
>> 24 in collection set
>> 25 mark: mark(is_neutral no_hash age=0)
>> 26 region: | 3964|CS |BTE 7f0c2f800000, 7f0c2fa00000, 7f0c2fa00000|TAMS 7f0c2fa00000|UWM 7f0c2fa00000|U 2048K|T 0B|G 2048K|S 0B|L 1612K|CP 0
>> 27
>> 28 Forwardee:
>> 29 (the object itself)
>>
>>
>> The StackChunk object `0x00007f0c2f848a40` is not marked which happens to be referenced from the parent field of the newly allocated StackChunk object `0x00007f0a40000060`.
>> The sequence for setting `StackChunk::parent` is as follows. At some point during the process of freezing the continuation, the jvm does:
>>
>>
>> continuationWrapper::_tail = stackChunk1
>> stackChunk2 = allocate new StackChunk
>> stackChunk2::parent = continuationWrapper::_tail
>> continuationWrapper::_tail = stackChunk2
>>
>>
>> At the end of the sequence stackChunk1 is only reachable from stackChunk2. If stackChunk2 happens to be allocated after concurrent mark has started and if the shenandoahgc is using IU mode, then the stackChunk2 would would not be scanned. This results in gc missing the marking of stackChunk1 which triggers the the assertion during heap verification.
>>
>> This is similar to the sequence described by @fisk [here](https://github.com/openjdk/jdk19/pull/140#issuecomment-1185491224)
>>
>> There is code in `FreezeBase::finish_freeze()` to call `stackChunkOopDesc::do_barriers()` which triggers the gc barriers for the newly allocated StackChunk object. But it has two problems:
>> 1. The call to `stackChunkOopDesc::do_barriers()` is guarded by a flag [1] which is false for StackChunk objects allocated after marking has started [2]
>> 2. `stackChunkOopDesc::do_barriers()` currently triggers the gc barriers for the oops in the stack represented by the newly allocated StackChunk, but it ignores the oops in the StackChunk header
>>
>> To fix these two issues, we need the following changes:
>> 1. Always enable barrier for Shenandoah IU mode (this change is same as 8288129: Shenandoah: Skynet test crashed with iu + aggressive #9494).
>> 2. Add the code in `stackChunkOopDesc::do_barriers` to run the barriers on the oops present in the stack chunk header.
>>
>> [1] https://github.com/openjdk/jdk/blob/9a0d1e7ce86368cdcade713a9e220604f7d77ecf/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1201
>> [2] https://github.com/openjdk/jdk/blob/9a0d1e7ce86368cdcade713a9e220604f7d77ecf/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L2308
>>
>> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>
> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:
>
> Enable loom for ShenandoahGC IU mode
>
> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
Closing it in favor of #10089
-------------
PR: https://git.openjdk.org/jdk/pull/9982
More information about the hotspot-dev
mailing list