RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v3]

Hao Sun haosun at openjdk.org
Wed Aug 16 01:53:35 UTC 2023


On Tue, 15 Aug 2023 11:14:54 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Remove my temp test patch on jvmci_global.hpp and stubGenerator_aarch64.hpp
>>  - Use relative SP as the PAC modifier
>>  - Merge branch 'master' into jdk-8287325
>>  - Merge branch 'master' into jdk-8287325
>>  - Rename return_pc_at and patch_pc_at
>>    
>>    Rename return_pc_at to return_address_at.
>>    Rename patch_pc_at to patch_return_address_at.
>>  - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret
>>    
>>    * Background
>>    
>>    1. PAC-RET branch protection was initially implemented on Linux/AArch64
>>    in JDK-8277204 [1].
>>    
>>    2. However, it was broken with the introduction of virtual threads [2],
>>    mainly because the continuation freeze/thaw mechanism would trigger
>>    stack copying to/from memory, whereas the saved and signed LR on the
>>    stack doesn't get re-signed accordingly.
>>    
>>    3. PR-9067 [3] tried to implement the re-sign part, but it was not
>>    accepted because option "PreserveFramePointer" is always turned on by
>>    PAC-RET but this would slow down virtual threads by ~5-20x.
>>    
>>    4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview
>>    language features are enabled. Note that virtual thread is one preview
>>    feature then.
>>    
>>    5. Virtual thread will become a permanent feature in JDK-21 [5][6].
>>    
>>    * Goal
>>    
>>    This patch aims to make PAC-RET compatible with virtual threads.
>>    
>>    * Requirements of virtual threads
>>    
>>    R-1: Option "PreserveFramePointer" should be turned off. That is,
>>    PAC-RET implementation should not rely on frame pointer FP. Otherwise,
>>    the fast path in stack copying will never be taken.
>>    
>>    R-2: Use some invariant values to stack copying as the modifier, so as
>>    to avoid the PAC re-sign for continuation thaw, as the fast path in
>>    stack copying doesn't walk the frame.
>>    
>>    Note that more details can be found in the discussion [3].
>>    
>>    * Investigation
>>    
>>    We considered to use (relative) stack pointer SP, thread ID, PACStack
>>    [7] and value zero as the candidate modifier.
>>    
>>    1. SP: In some scenarios, we need to authenticate the return address in
>>    places where the current SP doesn't match the SP on function entry. E.g.
>>    see the usage in Runtime1::generate_handle_exception(). Hence, neither
>>    absolute nor relative SP works.
>>...
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6002:
> 
>> 6000:       pacia(lr, rscratch1);
>> 6001:       ldp(rscratch1, zr, Address(post(sp, 2 * wordSize)));
>> 6002:     }
> 
> Suggestion:
> 
>       ldr(lr, Address(thread, JavaThread::cont_entry_offset()));
>       sub(lr, sp, lr);
>       pacia(lr, lr);
>     }
> 
> Maybe?

I'm afraid not.
In this way, the original value of `lr` would be overwritten before `pacia` instruction, and we turn to protect `relative SP` with `relative SP` as the modifier, rather than protecting `lr` with `relative SP` as the modifier.

We may get some basics of `ARM pauth` from this slides, https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf

For simplicity, we can view the process of "PAC signing" and "PAC authentication" as the **generation** and **verification** of the **PAC code** (something like one hash code).

For PAC generation, three inputs are taken in
- the pointer to protection, i.e. LR in our context
- one key, which is maintained by the underlying kernel
- context, i.e. the modifier. we can use any value in theory, like constants, zero, some runtime values(SP, FP, thread id etc)

In our context, after `pacia` instruction, the LR under protection would turn to be "PAC code + LR", where the generated PAC code is put at the higher N bits and the lower 48 bit is still the original LR.
Hence, the PACed LR cannot be accessed directly, until the corresponding authentication is conducted.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1295301014


More information about the hotspot-dev mailing list