RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
Aleksey Shipilev
shade at openjdk.org
Fri Oct 18 10:09:15 UTC 2024
On Thu, 17 Oct 2024 13:12:11 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> The reason for the crash is that compiled code reads from an object
> that's null. All field loads from an object are guarded by a null
> check. Where is the null check in that case? After the field load:
>
>
> 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load
> 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!)
> 0x00007ffaac912625: 74 5C je 0x7ffaac912683
>
>
> When the IR graph is constructed for the test case, the field load is
> correctly made dependent on the null check (through a `CastPP` node)
> but then something happens that's shenandoah specific and that causes
> the field load to become dependent on another check so it can execute
> before the null check.
>
> There are several load barriers involved in the process. One of them
> is expanded at the null check projection. In the process, control for
> the nodes that are control dependent on the null check is updated to
> be the region at the end of the just expanded barrier. The `CastPP`
> node for the null check gets the `Region` as new control.
>
> Another barrier is expanded right after that one. The 2 are back to
> back. They are merged. The `Region` that the `CastPP` depends on goes
> away, the `CastPP` is cloned in both branches at the `Region` and one
> of them becomes control dependent on the heap stable test of the first
> expanded barrier. At this point, one of the `CastPP` is control
> dependent on a heap stable test that's after the null check. But then,
> the heap stable test is moved out of loop and 2 copies of the loop are
> made so one can run without any overhead from barriers. When that
> happens, the `CastPP` becomes dependent on a test that dominates the
> null check and so the field load that depends on the `CastPP` can be
> scheduled before the null check.
>
> The fix I propose is not update the control when the barrier is
> expanded for nodes that can float when the test they depend on
> moves. This way the `CastPP` remains dependent on the null check.
Testing says no functional/performance regressions with this patch.
-------------
Marked as reviewed by shade (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/21562#pullrequestreview-2377642539
More information about the hotspot-compiler-dev
mailing list