RFR: JDK-8283326: Implement SafeFetch statically

Tue Apr 12 11:23:03 UTC 2022

On Fri, 18 Mar 2022 08:52:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Hi,
> 
> This is a reimplementation of SafeFetch to get rid of dynamic code generation. This is done, depending on platform, via static assembly, Posix setjmp/longjmp or SEH.
> 
> This is based on a proposal by @theRealAph [1] when discussing [JDK-8282475](https://bugs.openjdk.java.net/browse/JDK-8282475). I extended the proposal to do this for all platforms, not just MacOS, see discussion below.
> 
> ---
> 
> Why we do this:
> 
> SafeFetch is special: it is a basic utility function that may be called before the VM is properly initialized, from a "foreign" thread not attached to the VM, in a broken environment... mainly this is due to its use during stack walking, error reporting, hex dumping, etc. SafeFetch has to be safe to call under any circumstances - or as safe as we can make it.
> 
> Today, SafeFetch is implemented via stub routines. That causes problems:
> 
> - MacOs aarch64 has mechanisms to prevent execution of arbitrary non-text memory. On that platform, invoking generated code requires some awkward gymnastics [2]. These workarounds require access to Thread::current, which renders SafeFetch non-functional if Thread::current is NULL (e.g. crash reporting in "foreign" native threads or in a corrupted VM). They also cost a bit of performance on every SafeFetch invocation. Note that more OSes may implement similar execution prevention mechanics in the future.
> 
> - There is a time window during VM initialization where the stubs are not generated yet and thus SafeFetch is not available.
> 
> Both points cause hs-err files to be compromised (e.g. missing call stacks) for crashes in native threads, when the VM is corrupted, or early during VM initialization. Since SafeFetch is such a basic tool, it causes more issues than that (e.g., on AIX it is used for verifying mprotect() itself). But hs-err files being compromised by secondary errors is the most annoying effect.
> 
> A minor point is that generating stubs takes a tiny bit of startup time. Not a big issue, but anything we can generate at build time instead of at runtime is good.
> 
> ---
> 
> The patch:
> 
> - Removes the SafeFetch stub routine generation for all platforms
> - For Linux and MacOS, it implements SafeFetch as cpu-specific static assembler routines. These routines live in `os_cpu/<os>_<cpu>/safefetch_<os>_<cpu>.S`.
> - On Windows, SafeFetch is implemented via SEH. This is beautifully simple and does not need handwritten assembly at all. See `os/windows/safefetch_windows.hpp`.
> - On Zero, SafeFetch uses a Posix compatible solution using `sigsetjmp(2)`. This is portable and robust, but a bit slow compared to the other methods. Note that this solution existed before, and is well tested. I just moved the code around and massaged it a bit.
> - AIX uses now the same method as Zero. AIX "downgraded" in that sense from the stub routine solution, but since SAP gave up maintenance of AIX, I don't have easy access to AIX hardware. If SafeFetch speed becomes an issue, I am sure a solution based on static assembly can be implemented on AIX too.
> - The SafeFetch gtests were greatly expanded.
> 
> Tests:
> 
> - The patch was tested manually by building and running gtests on:
>   - MacOS (x64, aarch64)
>   - Linux (aarch64, arm, ppcle, s390, x64, x86, riscv)
>   - Windows (x64 and x86, but not arm, see below)
>   - AIX
>   - Zero (on Linux x64).
> - GHAs
> - The patch ran through SAP nightlies for several nights in a row
> 
> Note:
> - I have no access to Windows arm, so I could not test it. I contacted @mo-beck  to verify that the patch works there.
> - I have no access to BSD. I am not sure who maintains BSD, so I have no idea who to contact. However, I believe the MacOS solution should work for BSD too. If not, BSD can simply fall back to the Posix sigjmp solution like AIX and Zero.
> 
> It would be nice to have @RealLucy, @TheRealMDoerr and @theRealAph to okay s390, ppcle and aarch64 (and possibly arm32). Each assembly file is tiny, so it should not take long. @RealFYang and @backwaterred already tested the patch on RiscV and AIX (thanks).
> 
> Thanks, Thomas
> 
> [1] https://github.com/openjdk/jdk/pull/7727#issuecomment-1064980598
> [2] https://github.com/openjdk/jdk/blob/4df67426ed02f18af0757897acb28b636a317a91/src/hotspot/share/runtime/safefetch.inline.hpp#L37-L41

Hi David, Florian and Anton,

thanks for reviewing this. Changes:

- In safefetch_sigjmp.cpp, moved pthread_setspecific out of signal handling. Thanks to @fweimer-rh for catching this.
- In safefetch_sigjmp.cpp, reset TLS slot before accessing output variable for writing.
- Following Antons proposal, I removed CanSafeFetchxx. I wondered for a brief moment it was worth keeping it, since strictly speaking two of the three methods need signal handlers to work, so cannot work before we install the hotspot signal handler. But that would incur runtime checks which are not really useful in practice: all situations where SafeFetch are invoked are guaranteed to happen when signal handler has been installed (e.g. hs-err files can only get written if our handler runs).
- Following Antons proposal, I changed the RISC implementations to use one operation less per standard path by loading into directly into the return register.
- I whittled the usage of SAFEFETCH_METHOD_ macros down to the bare minimum needed to control which implementation Posix platforms use.
- Moved safefetch_static.cpp to os/posix/safefetch_static_posix.cpp.

+ minor stuff, too small to mention, all review driven.

Thanks for reviewing!

Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/7865