RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress

Roland Westrelin roland at openjdk.org
Mon Oct 21 07:41:50 UTC 2024


On Thu, 17 Oct 2024 13:40:16 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> The reason for the crash is that compiled code reads from an object
>> that's null. All field loads from an object are guarded by a null
>> check. Where is the null check in that case? After the field load:
>> 
>> 
>> 0x00007ffaac91261f:  44 8B 69 0C                      mov    r13d, dword ptr [rcx + 0xc] <- field load
>> 0x00007ffaac912623:  85 C9                            test   ecx, ecx                    <- null check (oops!)
>> 0x00007ffaac912625:  74 5C                            je     0x7ffaac912683
>> 
>> 
>> When the IR graph is constructed for the test case, the field load is
>> correctly made dependent on the null check (through a `CastPP` node)
>> but then something happens that's shenandoah specific and that causes
>> the field load to become dependent on another check so it can execute
>> before the null check.
>> 
>> There are several load barriers involved in the process. One of them
>> is expanded at the null check projection. In the process, control for
>> the nodes that are control dependent on the null check is updated to
>> be the region at the end of the just expanded barrier. The `CastPP`
>> node for the null check gets the `Region` as new control.
>> 
>> Another barrier is expanded right after that one. The 2 are back to
>> back. They are merged. The `Region` that the `CastPP` depends on goes
>> away, the `CastPP` is cloned in both branches at the `Region` and one
>> of them becomes control dependent on the heap stable test of the first
>> expanded barrier. At this point, one of the `CastPP` is control
>> dependent on a heap stable test that's after the null check. But then,
>> the heap stable test is moved out of loop and 2 copies of the loop are
>> made so one can run without any overhead from barriers. When that
>> happens, the `CastPP` becomes dependent on a test that dominates the
>> null check and so the field load that depends on the `CastPP` can be
>> scheduled before the null check.
>> 
>> The fix I propose is not update the control when the barrier is
>> expanded for nodes that can float when the test they depend on
>> moves. This way the `CastPP` remains dependent on the null check.
>
> Thanks! I am running tests with this patch.

@shipilev @rkennke thanks for the reviews + testing

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21562#issuecomment-2425863060


More information about the hotspot-compiler-dev mailing list